Optimally tackling covariate shift in RKHS-based nonparametric
regression
- URL: http://arxiv.org/abs/2205.02986v2
- Date: Tue, 6 Jun 2023 16:20:30 GMT
- Title: Optimally tackling covariate shift in RKHS-based nonparametric
regression
- Authors: Cong Ma, Reese Pathak, Martin J. Wainwright
- Abstract summary: We show that a kernel ridge regression estimator with a carefully chosen regularization parameter is minimax rate-optimal.
We also show that a naive estimator, which minimizes the empirical risk over the function class, is strictly sub-optimal.
We propose a reweighted KRR estimator that weights samples based on a careful truncation of the likelihood ratios.
- Score: 43.457497490211985
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the covariate shift problem in the context of nonparametric
regression over a reproducing kernel Hilbert space (RKHS). We focus on two
natural families of covariate shift problems defined using the likelihood
ratios between the source and target distributions. When the likelihood ratios
are uniformly bounded, we prove that the kernel ridge regression (KRR)
estimator with a carefully chosen regularization parameter is minimax
rate-optimal (up to a log factor) for a large family of RKHSs with regular
kernel eigenvalues. Interestingly, KRR does not require full knowledge of
likelihood ratios apart from an upper bound on them. In striking contrast to
the standard statistical setting without covariate shift, we also demonstrate
that a naive estimator, which minimizes the empirical risk over the function
class, is strictly sub-optimal under covariate shift as compared to KRR. We
then address the larger class of covariate shift problems where the likelihood
ratio is possibly unbounded yet has a finite second moment. Here, we propose a
reweighted KRR estimator that weights samples based on a careful truncation of
the likelihood ratios. Again, we are able to show that this estimator is
minimax rate-optimal, up to logarithmic factors.
Related papers
- Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Nonparametric logistic regression with deep learning [1.2509746979383698]
In the nonparametric logistic regression, the Kullback-Leibler divergence could diverge easily.
Instead of analyzing the excess risk itself, it suffices to show the consistency of the maximum likelihood estimator.
As an important application, we derive the convergence rates of the NPMLE with deep neural networks.
arXiv Detail & Related papers (2024-01-23T04:31:49Z) - Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression [12.443289202402761]
We show the benefits of batch- partitioning through the lens of a minimum-norm overparametrized linear regression model.
We characterize the optimal batch size and show it is inversely proportional to the noise level.
We also show that shrinking the batch minimum-norm estimator by a factor equal to the Weiner coefficient further stabilizes it and results in lower quadratic risk in all settings.
arXiv Detail & Related papers (2023-06-14T11:02:08Z) - Benign overfitting and adaptive nonparametric regression [71.70323672531606]
We construct an estimator which is a continuous function interpolating the data points with high probability.
We attain minimax optimal rates under mean squared risk on the scale of H"older classes adaptively to the unknown smoothness.
arXiv Detail & Related papers (2022-06-27T14:50:14Z) - A new similarity measure for covariate shift with applications to
nonparametric regression [43.457497490211985]
We introduce a new measure of distribution mismatch based on the integrated ratio of probabilities of balls at a given radius.
In comparison to the recently proposed notion of transfer exponent, this measure leads to a sharper rate of convergence.
arXiv Detail & Related papers (2022-02-06T19:14:50Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Distribution-Free Robust Linear Regression [5.532477732693]
We study random design linear regression with no assumptions on the distribution of the covariates.
We construct a non-linear estimator achieving excess risk of order $d/n$ with the optimal sub-exponential tail.
We prove an optimal version of the classical bound for the truncated least squares estimator due to Gy"orfi, Kohler, Krzyzak, and Walk.
arXiv Detail & Related papers (2021-02-25T15:10:41Z) - Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient
Estimator [93.05919133288161]
We show that the variance of the straight-through variant of the popular Gumbel-Softmax estimator can be reduced through Rao-Blackwellization.
This provably reduces the mean squared error.
We empirically demonstrate that this leads to variance reduction, faster convergence, and generally improved performance in two unsupervised latent variable models.
arXiv Detail & Related papers (2020-10-09T22:54:38Z) - Robust regression with covariate filtering: Heavy tails and adversarial
contamination [6.939768185086755]
We show how to modify the Huber regression, least trimmed squares, and least absolute deviation estimators to obtain estimators simultaneously computationally and statistically efficient in the stronger contamination model.
We show that the Huber regression estimator achieves near-optimal error rates in this setting, whereas the least trimmed squares and least absolute deviation estimators can be made to achieve near-optimal error after applying a postprocessing step.
arXiv Detail & Related papers (2020-09-27T22:48:48Z) - SUMO: Unbiased Estimation of Log Marginal Probability for Latent
Variable Models [80.22609163316459]
We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series.
We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost.
arXiv Detail & Related papers (2020-04-01T11:49:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.