Understanding Why Generalized Reweighting Does Not Improve Over ERM
- URL: http://arxiv.org/abs/2201.12293v1
- Date: Fri, 28 Jan 2022 17:58:38 GMT
- Title: Understanding Why Generalized Reweighting Does Not Improve Over ERM
- Authors: Runtian Zhai, Chen Dan, Zico Kolter, Pradeep Ravikumar
- Abstract summary: Empirical risk minimization (ERM) is known in practice to be non-robust to distributional shift where the training and the test distributions are different.
A suite of approaches, such as importance weighting, and variants of distributionally robust optimization (DRO) have been proposed to solve this problem.
But a line of recent work has empirically shown that these approaches do not significantly improve over ERM in real applications with distribution shift.
- Score: 36.69039005731499
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Empirical risk minimization (ERM) is known in practice to be non-robust to
distributional shift where the training and the test distributions are
different. A suite of approaches, such as importance weighting, and variants of
distributionally robust optimization (DRO), have been proposed to solve this
problem. But a line of recent work has empirically shown that these approaches
do not significantly improve over ERM in real applications with distribution
shift. The goal of this work is to obtain a comprehensive theoretical
understanding of this intriguing phenomenon. We first posit the class of
Generalized Reweighting (GRW) algorithms, as a broad category of approaches
that iteratively update model parameters based on iterative reweighting of the
training samples. We show that when overparameterized models are trained under
GRW, the resulting models are close to that obtained by ERM. We also show that
adding small regularization which does not greatly affect the empirical
training accuracy does not help. Together, our results show that a broad
category of what we term GRW approaches are not able to achieve
distributionally robust generalization. Our work thus has the following
sobering takeaway: to make progress towards distributionally robust
generalization, we either have to develop non-GRW approaches, or perhaps devise
novel classification/regression loss functions that are adapted to the class of
GRW approaches.
Related papers
- Mitigating Covariate Shift in Misspecified Regression with Applications
to Reinforcement Learning [39.02112341007981]
We study the effect of distribution shift in the presence of model misspecification.
We show that empirical risk minimization, or standard least squares regression, can result in undesirable misspecification amplification.
We develop a new algorithm that avoids this undesirable behavior, resulting in no misspecification amplification while still obtaining optimal statistical rates.
arXiv Detail & Related papers (2024-01-22T18:59:12Z) - Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift [12.770658031721435]
We propose a method for adapting the weights of the last layer of a pre-trained neural regression model to perform better on input data originating from a different distribution.
We demonstrate how this lightweight spectral adaptation procedure can improve out-of-distribution performance for synthetic and real-world datasets.
arXiv Detail & Related papers (2023-12-29T04:15:58Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Stochastic Training is Not Necessary for Generalization [57.04880404584737]
It is widely believed that the implicit regularization of gradient descent (SGD) is fundamental to the impressive generalization behavior we observe in neural networks.
In this work, we demonstrate that non-stochastic full-batch training can achieve strong performance on CIFAR-10 that is on-par with SGD.
arXiv Detail & Related papers (2021-09-29T00:50:00Z) - Predicting Deep Neural Network Generalization with Perturbation Response
Curves [58.8755389068888]
We propose a new framework for evaluating the generalization capabilities of trained networks.
Specifically, we introduce two new measures for accurately predicting generalization gaps.
We attain better predictive scores than the current state-of-the-art measures on a majority of tasks in the Predicting Generalization in Deep Learning (PGDL) NeurIPS 2020 competition.
arXiv Detail & Related papers (2021-06-09T01:37:36Z) - The Risks of Invariant Risk Minimization [52.7137956951533]
Invariant Risk Minimization is an objective based on the idea for learning deep, invariant features of data.
We present the first analysis of classification under the IRM objective--as well as these recently proposed alternatives--under a fairly natural and general model.
We show that IRM can fail catastrophically unless the test data are sufficiently similar to the training distribution--this is precisely the issue that it was intended to solve.
arXiv Detail & Related papers (2020-10-12T14:54:32Z) - Learning Invariant Representations and Risks for Semi-supervised Domain
Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA)
We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z) - On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation.
We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.