Distributionally Robust Models with Parametric Likelihood Ratios
- URL: http://arxiv.org/abs/2204.06340v1
- Date: Wed, 13 Apr 2022 12:43:12 GMT
- Title: Distributionally Robust Models with Parametric Likelihood Ratios
- Authors: Paul Michel, Tatsunori Hashimoto, Graham Neubig
- Abstract summary: Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
- Score: 123.05074253513935
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As machine learning models are deployed ever more broadly, it becomes
increasingly important that they are not only able to perform well on their
training distribution, but also yield accurate predictions when confronted with
distribution shift. The Distributionally Robust Optimization (DRO) framework
proposes to address this issue by training models to minimize their expected
risk under a collection of distributions, to imitate test-time shifts. This is
most commonly achieved by instance-level re-weighting of the training objective
to emulate the likelihood ratio with possible test distributions, which allows
for estimating their empirical risk via importance sampling (assuming that they
are subpopulations of the training distribution). However, re-weighting schemes
in the literature are usually limited due to the difficulty of keeping the
optimization problem tractable and the complexity of enforcing normalization
constraints. In this paper, we show that three simple ideas -- mini-batch level
normalization, a KL penalty and simultaneous gradient updates -- allow us to
train models with DRO using a broader class of parametric likelihood ratios. In
a series of experiments on both image and text classification benchmarks, we
find that models trained with the resulting parametric adversaries are
consistently more robust to subpopulation shifts when compared to other DRO
approaches, and that the method performs reliably well with little
hyper-parameter tuning. Code to reproduce our experiments can be found at
https://github.com/pmichel31415/P-DRO.
Related papers
- Distributionally Robust Post-hoc Classifiers under Prior Shifts [31.237674771958165]
We investigate the problem of training models that are robust to shifts caused by changes in the distribution of class-priors or group-priors.
We present an extremely lightweight post-hoc approach that performs scaling adjustments to predictions from a pre-trained model.
arXiv Detail & Related papers (2023-09-16T00:54:57Z) - Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization [61.39201891894024]
Group distributionally robust optimization (group DRO) can minimize the worst-case loss over pre-defined groups.
We reformulate the group DRO framework by proposing Q-Diversity.
Characterized by an interactive training mode, Q-Diversity relaxes the group identification from annotation into direct parameterization.
arXiv Detail & Related papers (2023-05-20T07:02:27Z) - Learning Sampling Distributions for Model Predictive Control [36.82905770866734]
Sampling-based approaches to Model Predictive Control (MPC) have become a cornerstone of contemporary approaches to MPC.
We propose to carry out all operations in the latent space, allowing us to take full advantage of the learned distribution.
Specifically, we frame the learning problem as bi-level optimization and show how to train the controller with backpropagation-through-time.
arXiv Detail & Related papers (2022-12-05T20:35:36Z) - Minimax Regret Optimization for Robust Machine Learning under
Distribution Shift [38.30154154957721]
We consider learning scenarios where the learned model is evaluated under an unknown test distribution.
We show that the DRO formulation does not guarantee uniformly small regret under distribution shift.
We propose an alternative method called Minimax Regret Optimization (MRO)
arXiv Detail & Related papers (2022-02-11T04:17:22Z) - Model-based micro-data reinforcement learning: what are the crucial
model properties and which model to choose? [0.2836066255205732]
We contribute to micro-data model-based reinforcement learning (MBRL) by rigorously comparing popular generative models.
We find that on an environment that requires multimodal posterior predictives, mixture density nets outperform all other models by a large margin.
We also found that deterministic models are on par, in fact they consistently (although non-significantly) outperform their probabilistic counterparts.
arXiv Detail & Related papers (2021-07-24T11:38:25Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution.
This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes.
Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z) - Learning Diverse Representations for Fast Adaptation to Distribution
Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task.
We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z) - Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples.
Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks.
In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.