Weighted Empirical Risk Minimization: Sample Selection Bias Correction
based on Importance Sampling
- URL: http://arxiv.org/abs/2002.05145v2
- Date: Wed, 19 Feb 2020 15:50:34 GMT
- Title: Weighted Empirical Risk Minimization: Sample Selection Bias Correction
based on Importance Sampling
- Authors: Robin Vogel, Mastane Achab, St\'ephan Cl\'emen\c{c}on, Charles Tillier
- Abstract summary: We consider statistical learning problems when the distribution $P'$ of the training observations $Z'_i$ differs from the distribution $P'$ involved in the risk one seeks to minimize.
We show that, in various situations frequently encountered in practice, it takes a simple form and can be directly estimated from the $Phi(z)$.
We then prove that the capacity generalization of the approach aforementioned is preserved when plugging the resulting estimates of the $Phi(Z'_i)$'s into the weighted empirical risk.
- Score: 2.599882743586164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider statistical learning problems, when the distribution $P'$ of the
training observations $Z'_1,\; \ldots,\; Z'_n$ differs from the distribution
$P$ involved in the risk one seeks to minimize (referred to as the test
distribution) but is still defined on the same measurable space as $P$ and
dominates it. In the unrealistic case where the likelihood ratio
$\Phi(z)=dP/dP'(z)$ is known, one may straightforwardly extends the Empirical
Risk Minimization (ERM) approach to this specific transfer learning setup using
the same idea as that behind Importance Sampling, by minimizing a weighted
version of the empirical risk functional computed from the 'biased' training
data $Z'_i$ with weights $\Phi(Z'_i)$. Although the importance function
$\Phi(z)$ is generally unknown in practice, we show that, in various situations
frequently encountered in practice, it takes a simple form and can be directly
estimated from the $Z'_i$'s and some auxiliary information on the statistical
population $P$. By means of linearization techniques, we then prove that the
generalization capacity of the approach aforementioned is preserved when
plugging the resulting estimates of the $\Phi(Z'_i)$'s into the weighted
empirical risk. Beyond these theoretical guarantees, numerical results provide
strong empirical evidence of the relevance of the approach promoted in this
article.
Related papers
- Online non-parametric likelihood-ratio estimation by Pearson-divergence
functional minimization [55.98760097296213]
We introduce a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t sim p, x'_t sim q)$ are observed over time.
We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments.
arXiv Detail & Related papers (2023-11-03T13:20:11Z) - Optimal Excess Risk Bounds for Empirical Risk Minimization on $p$-Norm Linear Regression [19.31269916674961]
We show that, in the realizable case, under no moment assumptions, $O(d)$ samples are enough to exactly recover the target.
We extend this result to the case $p in (1, 2)$ under mild assumptions that guarantee the existence of the Hessian of the risk at its minimizer.
arXiv Detail & Related papers (2023-10-19T03:21:28Z) - On Regression in Extreme Regions [1.0338669373504403]
This paper focuses on the case of extreme (i.e. very large) observations $X$.
Because of their rarity, the contributions of such observations to the (empirical) error is negligible.
We show that an empirical and nonasymptotic version of this 'extreme risk', based on a fraction of the largest observations solely, yields good generalization capacity.
arXiv Detail & Related papers (2023-03-06T12:55:38Z) - On the Provable Advantage of Unsupervised Pretraining [26.065736182939222]
Unsupervised pretraining is a critical component of modern large-scale machine learning systems.
This paper studies a generic framework, where the unsupervised representation learning task is specified by an abstract class of latent variable models.
Under a mild ''informative'' condition, our algorithm achieves an excess risk of $tildemathcalO(sqrtmathcalC_Phi/m + sqrtmathcalC_Psi/n)$ for downstream tasks.
arXiv Detail & Related papers (2023-03-02T20:42:05Z) - Statistical Learning under Heterogeneous Distribution Shift [71.8393170225794]
Ground-truth predictor is additive $mathbbE[mathbfz mid mathbfx,mathbfy] = f_star(mathbfx) +g_star(mathbfy)$.
arXiv Detail & Related papers (2023-02-27T16:34:21Z) - A Statistical Learning View of Simple Kriging [0.0]
We analyze the simple Kriging task from a statistical learning perspective.
The goal is to predict the unknown values it takes at any other location with minimum quadratic risk.
We prove non-asymptotic bounds of order $O_mathbbP (1/sqrtn)$ for the excess risk of a plug-in predictive rule mimicking the true minimizer.
arXiv Detail & Related papers (2022-02-15T12:46:43Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks.
Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target.
Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z) - Sharp Statistical Guarantees for Adversarially Robust Gaussian
Classification [54.22421582955454]
We provide the first result of the optimal minimax guarantees for the excess risk for adversarially robust classification.
Results are stated in terms of the Adversarial Signal-to-Noise Ratio (AdvSNR), which generalizes a similar notion for standard linear classification to the adversarial setting.
arXiv Detail & Related papers (2020-06-29T21:06:52Z) - Toward Adversarial Robustness via Semi-supervised Robust Training [93.36310070269643]
Adrial examples have been shown to be the severe threat to deep neural networks (DNNs)
We propose a novel defense method, the robust training (RT), by jointly minimizing two separated risks ($R_stand$ and $R_rob$)
arXiv Detail & Related papers (2020-03-16T02:14:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.