Related papers: Frustratingly Easy Model Generalization by Dummy Risk Minimization

Frustratingly Easy Model Generalization by Dummy Risk Minimization

URL: http://arxiv.org/abs/2308.02287v2
Date: Sat, 7 Oct 2023 05:53:30 GMT
Title: Frustratingly Easy Model Generalization by Dummy Risk Minimization
Authors: Juncheng Wang, Jindong Wang, Xixu Hu, Shujun Wang, Xing Xie
Abstract summary: Dummy Risk Minimization (DuRM) is a frustratingly easy and general technique to improve the generalization of Empirical risk minimization (ERM) We show that DuRM could consistently improve the performance under all tasks with an almost free lunch manner.
Score: 38.67678021055096
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Empirical risk minimization (ERM) is a fundamental machine learning paradigm. However, its generalization ability is limited in various tasks. In this paper, we devise Dummy Risk Minimization (DuRM), a frustratingly easy and general technique to improve the generalization of ERM. DuRM is extremely simple to implement: just enlarging the dimension of the output logits and then optimizing using standard gradient descent. Moreover, we validate the efficacy of DuRM on both theoretical and empirical analysis. Theoretically, we show that DuRM derives greater variance of the gradient, which facilitates model generalization by observing better flat local minima. Empirically, we conduct evaluations of DuRM across different datasets, modalities, and network architectures on diverse tasks, including conventional classification, semantic segmentation, out-of-distribution generalization, adverserial training, and long-tailed recognition. Results demonstrate that DuRM could consistently improve the performance under all tasks with an almost free lunch manner. Furthermore, we show that DuRM is compatible with existing generalization techniques and we discuss possible limitations. We hope that DuRM could trigger new interest in the fundamental research on risk minimization.

Related papers

Robust Invariant Representation Learning by Distribution Extrapolation [3.5051814539447474]
Invariant risk minimization (IRM) aims to enable out-of-distribution generalization in deep learning.<n>Existing approaches -- including IRMv1 -- adopt penalty-based single-level approximations.<n>A novel framework is proposed that enhances environmental diversity by augmenting the IRM penalty through synthetic distributional shifts.
arXiv Detail & Related papers (2025-05-22T02:03:34Z)
Functional Risk Minimization [89.85247272720467]
We propose Functional Risk Minimization, a framework where losses compare functions rather than outputs. This results in better performance in supervised, unsupervised, and RL experiments.
arXiv Detail & Related papers (2024-12-30T18:29:48Z)
Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models [60.38983114420845]
We propose dual risk minimization (DRM) to better preserve the core features of downstream tasks. DRM balances expected performance and worst-case performance, establishing a new state of the art on various real-world benchmarks.
arXiv Detail & Related papers (2024-11-29T15:01:25Z)
Invariant Risk Minimization Is A Total Variation Model [3.000494957386027]
Invariant risk minimization (IRM) is an arising approach to generalize invariant features to different environments in machine learning. We show that IRM is essentially a total variation based on $L2$ (TV-$ell$) of the learning risk. We propose a novel IRM framework based on the TV-$ell$ model.
arXiv Detail & Related papers (2024-05-02T15:34:14Z)
On the Variance, Admissibility, and Stability of Empirical Risk Minimization [80.26309576810844]
Empirical Risk Minimization (ERM) with squared loss may attain minimax suboptimal error rates. We show that under mild assumptions, the suboptimality of ERM must be due to large bias rather than variance. We also show that our estimates imply stability of ERM, complementing the main result of Caponnetto and Rakhlin (2006) for non-Donsker classes.
arXiv Detail & Related papers (2023-05-29T15:25:48Z)
Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization [61.39201891894024]
Group distributionally robust optimization (group DRO) can minimize the worst-case loss over pre-defined groups. We reformulate the group DRO framework by proposing Q-Diversity. Characterized by an interactive training mode, Q-Diversity relaxes the group identification from annotation into direct parameterization.
arXiv Detail & Related papers (2023-05-20T07:02:27Z)
ERM++: An Improved Baseline for Domain Generalization [69.80606575323691]
Empirical Risk Minimization (ERM) can outperform most more complex Domain Generalization (DG) methods when properly tuned. ERM++ improves DG performance by over 5% compared to prior ERM baselines.
arXiv Detail & Related papers (2023-04-04T17:31:15Z)
What Is Missing in IRM Training and Evaluation? Challenges and Solutions [41.56612265456626]
Invariant risk minimization (IRM) has received increasing attention as a way to acquire environment-agnostic data representations and predictions. Recent works have found that the optimality of the originally-proposed IRM optimization (IRM) may be compromised in practice. We identify and resolve three practical limitations in IRM training and evaluation.
arXiv Detail & Related papers (2023-03-04T07:06:24Z)
Pareto Invariant Risk Minimization [32.01775861630696]
We propose a new optimization scheme for invariant risk minimization (IRM) called PAreto Invariant Risk Minimization (PAIR) We show PAIR can empower the practical IRM variants to overcome the barriers with the original IRM when provided with proper guidance.
arXiv Detail & Related papers (2022-06-15T19:04:02Z)
Hierarchies of Reward Machines [75.55324974788475]
Reward machines (RMs) are a recent formalism for representing the reward function of a reinforcement learning task through a finite-state machine. We propose a formalism for further abstracting the subtask structure by endowing an RM with the ability to call other RMs.
arXiv Detail & Related papers (2022-05-31T12:39:24Z)
The Missing Invariance Principle Found -- the Reciprocal Twin of Invariant Risk Minimization [7.6146285961466]
In Risk Minimization (IRM) can fail to generalize poorly to out-of-distribution (OOD) data. We show that MRI-v1 can guarantee invariant predictors given sufficient environments. We also demonstrate that MRI strongly out-performs IRM and achieves a near-optimal OOD in image-based problems.
arXiv Detail & Related papers (2022-05-29T00:14:51Z)
DAIR: Data Augmented Invariant Regularization [20.364846667289374]
In this paper, we propose data augmented invariant regularization (DAIR) We show that a particular form of the DAIR regularizer consistently performs well in a variety of settings. We apply it to multiple real-world learning problems involving domain shift.
arXiv Detail & Related papers (2021-10-21T15:30:40Z)
On the Minimal Error of Empirical Risk Minimization [90.09093901700754]
We study the minimal error of the Empirical Risk Minimization (ERM) procedure in the task of regression. Our sharp lower bounds shed light on the possibility (or impossibility) of adapting to simplicity of the model generating the data.
arXiv Detail & Related papers (2021-02-24T04:47:55Z)
MMCGAN: Generative Adversarial Network with Explicit Manifold Prior [78.58159882218378]
We propose to employ explicit manifold learning as prior to alleviate mode collapse and stabilize training of GAN. Our experiments on both the toy data and real datasets show the effectiveness of MMCGAN in alleviating mode collapse, stabilizing training, and improving the quality of generated samples.
arXiv Detail & Related papers (2020-06-18T07:38:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.