A Theoretical Analysis on Independence-driven Importance Weighting for
Covariate-shift Generalization
- URL: http://arxiv.org/abs/2111.02355v4
- Date: Tue, 17 Oct 2023 09:42:05 GMT
- Title: A Theoretical Analysis on Independence-driven Importance Weighting for
Covariate-shift Generalization
- Authors: Renzhe Xu, Xingxuan Zhang, Zheyan Shen, Tong Zhang, Peng Cui
- Abstract summary: independence-driven importance algorithms in stable learning literature have shown empirical effectiveness.
In this paper, we theoretically prove the effectiveness of such algorithms by explaining them as feature selection processes.
We prove that under ideal conditions, independence-driven importance weighting algorithms could identify the variables in this set.
- Score: 44.88645911638269
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Covariate-shift generalization, a typical case in out-of-distribution (OOD)
generalization, requires a good performance on the unknown test distribution,
which varies from the accessible training distribution in the form of covariate
shift. Recently, independence-driven importance weighting algorithms in stable
learning literature have shown empirical effectiveness to deal with
covariate-shift generalization on several learning models, including regression
algorithms and deep neural networks, while their theoretical analyses are
missing. In this paper, we theoretically prove the effectiveness of such
algorithms by explaining them as feature selection processes. We first specify
a set of variables, named minimal stable variable set, that is the minimal and
optimal set of variables to deal with covariate-shift generalization for common
loss functions, such as the mean squared loss and binary cross-entropy loss.
Afterward, we prove that under ideal conditions, independence-driven importance
weighting algorithms could identify the variables in this set. Analysis of
asymptotic properties is also provided. These theories are further validated in
several synthetic experiments.
Related papers
- High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization [83.06112052443233]
This paper studies kernel ridge regression in high dimensions under covariate shifts.
By a bias-variance decomposition, we theoretically demonstrate that the re-weighting strategy allows for decreasing the variance.
For bias, we analyze the regularization of the arbitrary or well-chosen scale, showing that the bias can behave very differently under different regularization scales.
arXiv Detail & Related papers (2024-06-05T12:03:27Z) - Towards a Unified Analysis of Kernel-based Methods Under Covariate Shift [18.240776405802205]
We propose a unified analysis of general nonparametric methods in a reproducing kernel Hilbert space.
Our theoretical results are established for a general loss belonging to a rich loss function family.
Our results concur with the optimal results in literature where the squared loss is used.
arXiv Detail & Related papers (2023-10-12T11:33:15Z) - Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics
for Convex Losses in High-Dimension [25.711297863946193]
We develop a theory for the study of fluctuations in an ensemble of generalised linear models trained on different, but correlated, features.
We provide a complete description of the joint distribution of the empirical risk minimiser for generic convex loss and regularisation in the high-dimensional limit.
arXiv Detail & Related papers (2022-01-31T17:44:58Z) - Covariate Shift in High-Dimensional Random Feature Regression [44.13449065077103]
Covariate shift is a significant obstacle in the development of robust machine learning models.
We present a theoretical understanding in context of modern machine learning.
arXiv Detail & Related papers (2021-11-16T05:23:28Z) - Bias-Variance Tradeoffs in Single-Sample Binary Gradient Estimators [100.58924375509659]
Straight-through (ST) estimator gained popularity due to its simplicity and efficiency.
Several techniques were proposed to improve over ST while keeping the same low computational complexity.
We conduct a theoretical analysis of Bias and Variance of these methods in order to understand tradeoffs and verify originally claimed properties.
arXiv Detail & Related papers (2021-10-07T15:16:07Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Statistical optimality and stability of tangent transform algorithms in
logit models [6.9827388859232045]
We provide conditions on the data generating process to derive non-asymptotic upper bounds to the risk incurred by the logistical optima.
In particular, we establish local variation of the algorithm without any assumptions on the data-generating process.
We explore a special case involving a semi-orthogonal design under which a global convergence is obtained.
arXiv Detail & Related papers (2020-10-25T05:15:13Z) - A One-step Approach to Covariate Shift Adaptation [82.01909503235385]
A default assumption in many machine learning scenarios is that the training and test samples are drawn from the same probability distribution.
We propose a novel one-step approach that jointly learns the predictive model and the associated weights in one optimization.
arXiv Detail & Related papers (2020-07-08T11:35:47Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.