Frustratingly Easy Model Generalization by Dummy Risk Minimization
- URL: http://arxiv.org/abs/2308.02287v2
- Date: Sat, 7 Oct 2023 05:53:30 GMT
- Title: Frustratingly Easy Model Generalization by Dummy Risk Minimization
- Authors: Juncheng Wang, Jindong Wang, Xixu Hu, Shujun Wang, Xing Xie
- Abstract summary: Dummy Risk Minimization (DuRM) is a frustratingly easy and general technique to improve the generalization of Empirical risk minimization (ERM)
We show that DuRM could consistently improve the performance under all tasks with an almost free lunch manner.
- Score: 38.67678021055096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Empirical risk minimization (ERM) is a fundamental machine learning paradigm.
However, its generalization ability is limited in various tasks. In this paper,
we devise Dummy Risk Minimization (DuRM), a frustratingly easy and general
technique to improve the generalization of ERM. DuRM is extremely simple to
implement: just enlarging the dimension of the output logits and then
optimizing using standard gradient descent. Moreover, we validate the efficacy
of DuRM on both theoretical and empirical analysis. Theoretically, we show that
DuRM derives greater variance of the gradient, which facilitates model
generalization by observing better flat local minima. Empirically, we conduct
evaluations of DuRM across different datasets, modalities, and network
architectures on diverse tasks, including conventional classification, semantic
segmentation, out-of-distribution generalization, adverserial training, and
long-tailed recognition. Results demonstrate that DuRM could consistently
improve the performance under all tasks with an almost free lunch manner.
Furthermore, we show that DuRM is compatible with existing generalization
techniques and we discuss possible limitations. We hope that DuRM could trigger
new interest in the fundamental research on risk minimization.
Related papers
- Functional Risk Minimization [89.85247272720467]
We propose Functional Risk Minimization, a framework where losses compare functions rather than outputs.
This results in better performance in supervised, unsupervised, and RL experiments.
arXiv Detail & Related papers (2024-12-30T18:29:48Z) - Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models [60.38983114420845]
We propose dual risk minimization (DRM) to better preserve the core features of downstream tasks.
DRM balances expected performance and worst-case performance, establishing a new state of the art on various real-world benchmarks.
arXiv Detail & Related papers (2024-11-29T15:01:25Z) - Invariant Risk Minimization Is A Total Variation Model [3.000494957386027]
Invariant risk minimization (IRM) is an arising approach to generalize invariant features to different environments in machine learning.
We show that IRM is essentially a total variation based on $L2$ (TV-$ell$) of the learning risk.
We propose a novel IRM framework based on the TV-$ell$ model.
arXiv Detail & Related papers (2024-05-02T15:34:14Z) - On the Variance, Admissibility, and Stability of Empirical Risk
Minimization [80.26309576810844]
Empirical Risk Minimization (ERM) with squared loss may attain minimax suboptimal error rates.
We show that under mild assumptions, the suboptimality of ERM must be due to large bias rather than variance.
We also show that our estimates imply stability of ERM, complementing the main result of Caponnetto and Rakhlin (2006) for non-Donsker classes.
arXiv Detail & Related papers (2023-05-29T15:25:48Z) - ERM++: An Improved Baseline for Domain Generalization [69.80606575323691]
Empirical Risk Minimization (ERM) can outperform most more complex Domain Generalization (DG) methods when properly tuned.
ERM++ improves DG performance by over 5% compared to prior ERM baselines.
arXiv Detail & Related papers (2023-04-04T17:31:15Z) - What Is Missing in IRM Training and Evaluation? Challenges and Solutions [41.56612265456626]
Invariant risk minimization (IRM) has received increasing attention as a way to acquire environment-agnostic data representations and predictions.
Recent works have found that the optimality of the originally-proposed IRM optimization (IRM) may be compromised in practice.
We identify and resolve three practical limitations in IRM training and evaluation.
arXiv Detail & Related papers (2023-03-04T07:06:24Z) - Pareto Invariant Risk Minimization [32.01775861630696]
We propose a new optimization scheme for invariant risk minimization (IRM) called PAreto Invariant Risk Minimization (PAIR)
We show PAIR can empower the practical IRM variants to overcome the barriers with the original IRM when provided with proper guidance.
arXiv Detail & Related papers (2022-06-15T19:04:02Z) - The Missing Invariance Principle Found -- the Reciprocal Twin of
Invariant Risk Minimization [7.6146285961466]
In Risk Minimization (IRM) can fail to generalize poorly to out-of-distribution (OOD) data.
We show that MRI-v1 can guarantee invariant predictors given sufficient environments.
We also demonstrate that MRI strongly out-performs IRM and achieves a near-optimal OOD in image-based problems.
arXiv Detail & Related papers (2022-05-29T00:14:51Z) - DAIR: Data Augmented Invariant Regularization [20.364846667289374]
In this paper, we propose data augmented invariant regularization (DAIR)
We show that a particular form of the DAIR regularizer consistently performs well in a variety of settings.
We apply it to multiple real-world learning problems involving domain shift.
arXiv Detail & Related papers (2021-10-21T15:30:40Z) - On the Minimal Error of Empirical Risk Minimization [90.09093901700754]
We study the minimal error of the Empirical Risk Minimization (ERM) procedure in the task of regression.
Our sharp lower bounds shed light on the possibility (or impossibility) of adapting to simplicity of the model generating the data.
arXiv Detail & Related papers (2021-02-24T04:47:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.