Learning Optimal Features via Partial Invariance
- URL: http://arxiv.org/abs/2301.12067v2
- Date: Mon, 3 Apr 2023 16:05:50 GMT
- Title: Learning Optimal Features via Partial Invariance
- Authors: Moulik Choraria, Ibtihal Ferwana, Ankur Mani, Lav R. Varshney
- Abstract summary: Invariant Risk Minimization (IRM) is a popular framework that aims to learn robust models from multiple environments.
We show that IRM can over-constrain the predictor and to remedy this, we propose a relaxation via $textitpartial invariance$.
Several experiments, conducted both in linear settings as well as with deep neural networks on tasks over both language and image data, allow us to verify our conclusions.
- Score: 18.552839725370383
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning models that are robust to distribution shifts is a key concern in
the context of their real-life applicability. Invariant Risk Minimization (IRM)
is a popular framework that aims to learn robust models from multiple
environments. The success of IRM requires an important assumption: the
underlying causal mechanisms/features remain invariant across environments.
When not satisfied, we show that IRM can over-constrain the predictor and to
remedy this, we propose a relaxation via $\textit{partial invariance}$. In this
work, we theoretically highlight the sub-optimality of IRM and then demonstrate
how learning from a partition of training domains can help improve invariant
models. Several experiments, conducted both in linear settings as well as with
deep neural networks on tasks over both language and image data, allow us to
verify our conclusions.
Related papers
- Invariant Risk Minimization Is A Total Variation Model [3.000494957386027]
Invariant risk minimization (IRM) is an arising approach to generalize invariant features to different environments in machine learning.
We show that IRM is essentially a total variation based on $L2$ (TV-$ell$) of the learning risk.
We propose a novel IRM framework based on the TV-$ell$ model.
arXiv Detail & Related papers (2024-05-02T15:34:14Z) - MetaRM: Shifted Distributions Alignment via Meta-Learning [52.94381279744458]
Reinforcement Learning from Human Feedback (RLHF) in language model alignment is critically dependent on the capability of the reward model (RM)
We introduce MetaRM, a method leveraging meta-learning to align the RM with the shifted environment distribution.
Extensive experiments demonstrate that MetaRM significantly improves the RM's distinguishing ability in iterative RLHF optimization.
arXiv Detail & Related papers (2024-05-01T10:43:55Z) - Counterfactual Fairness through Transforming Data Orthogonal to Bias [7.109458605736819]
We propose a novel data pre-processing algorithm, Orthogonal to Bias (OB)
OB is designed to eliminate the influence of a group of continuous sensitive variables, thus promoting counterfactual fairness in machine learning applications.
OB is model-agnostic, making it applicable to a wide range of machine learning models and tasks.
arXiv Detail & Related papers (2024-03-26T16:40:08Z) - Towards Understanding Variants of Invariant Risk Minimization through the Lens of Calibration [0.6906005491572401]
We show that Information Bottleneck-based IRM achieves consistent calibration across different environments.
Our empirical evidence indicates that models exhibiting consistent calibration across environments are also well-calibrated.
arXiv Detail & Related papers (2024-01-31T02:08:43Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient
for Out-of-Distribution Generalization [52.7137956951533]
We argue that devising simpler methods for learning predictors on existing features is a promising direction for future research.
We introduce Domain-Adjusted Regression (DARE), a convex objective for learning a linear predictor that is provably robust under a new model of distribution shift.
Under a natural model, we prove that the DARE solution is the minimax-optimal predictor for a constrained set of test distributions.
arXiv Detail & Related papers (2022-02-14T16:42:16Z) - DAIR: Data Augmented Invariant Regularization [20.364846667289374]
In this paper, we propose data augmented invariant regularization (DAIR)
We show that a particular form of the DAIR regularizer consistently performs well in a variety of settings.
We apply it to multiple real-world learning problems involving domain shift.
arXiv Detail & Related papers (2021-10-21T15:30:40Z) - Fairness and Robustness in Invariant Learning: A Case Study in Toxicity
Classification [13.456851070400024]
Invariant Risk Minimization (IRM) is a domain generalization algorithm that employs a causal discovery inspired method to find robust predictors.
We show that IRM achieves better out-of-distribution accuracy and fairness than Empirical Risk Minimization (ERM) methods.
arXiv Detail & Related papers (2020-11-12T16:42:14Z) - The Risks of Invariant Risk Minimization [52.7137956951533]
Invariant Risk Minimization is an objective based on the idea for learning deep, invariant features of data.
We present the first analysis of classification under the IRM objective--as well as these recently proposed alternatives--under a fairly natural and general model.
We show that IRM can fail catastrophically unless the test data are sufficiently similar to the training distribution--this is precisely the issue that it was intended to solve.
arXiv Detail & Related papers (2020-10-12T14:54:32Z) - Learning Diverse Representations for Fast Adaptation to Distribution
Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task.
We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z) - Learning to Learn Kernels with Variational Random Features [118.09565227041844]
We introduce kernels with random Fourier features in the meta-learning framework to leverage their strong few-shot learning ability.
We formulate the optimization of MetaVRF as a variational inference problem.
We show that MetaVRF delivers much better, or at least competitive, performance compared to existing meta-learning alternatives.
arXiv Detail & Related papers (2020-06-11T18:05:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.