Making Robust Generalizers Less Rigid with Loss Concentration
- URL: http://arxiv.org/abs/2408.03619v2
- Date: Tue, 20 May 2025 08:22:56 GMT
- Title: Making Robust Generalizers Less Rigid with Loss Concentration
- Authors: Matthew J. Holland, Toma Hamada,
- Abstract summary: Methods applying sharpness-aware minimization have proven successful for image classification tasks.<n>We show how such a strategy can dramatically break down under simpler models where the difficulty gap becomes more extreme.<n>We propose and evaluate a training criterion which penalizes poor loss concentration.
- Score: 6.527016551650139
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While the traditional formulation of machine learning tasks is in terms of performance on average, in practice we are often interested in how well a trained model performs on rare or difficult data points at test time. To achieve more robust and balanced generalization, methods applying sharpness-aware minimization to a subset of worst-case examples have proven successful for image classification tasks, but only using overparameterized neural networks under which the relative difference between "easy" and "hard" data points becomes negligible. In this work, we show how such a strategy can dramatically break down under simpler models where the difficulty gap becomes more extreme. As a more flexible alternative, instead of typical sharpness, we propose and evaluate a training criterion which penalizes poor loss concentration, which can be easily combined with loss transformations such exponential tilting, conditional value-at-risk (CVaR), or distributionally robust optimization (DRO) that control tail emphasis.
Related papers
- Bias Amplification Enhances Minority Group Performance [10.380812738348899]
We propose BAM, a novel two-stage training algorithm.
In the first stage, the model is trained using a bias amplification scheme via introducing a learnable auxiliary variable for each training sample.
In the second stage, we upweight the samples that the bias-amplified model misclassifies, and then continue training the same model on the reweighted dataset.
arXiv Detail & Related papers (2023-09-13T04:40:08Z) - Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks.
We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z) - Simple and Fast Group Robustness by Automatic Feature Reweighting [45.9024045614187]
We propose Automatic Feature Reweighting (AFR) to reduce the reliance on spurious features.
AFR retrains the last layer of a standard ERM-trained base model with a weighted loss.
We improve upon the best reported results among competing methods trained without spurious attributes on several vision and natural language classification benchmarks.
arXiv Detail & Related papers (2023-06-19T17:19:13Z) - Robust variance-regularized risk minimization with concomitant scaling [8.666172545138275]
Under losses which are potentially heavy-tailed, we consider the task of minimizing sums of the loss mean and standard deviation without trying to accurately estimate the variance.
By modifying a technique for variance-free robust mean estimation, we derive a simple learning procedure which can be easily combined with standard gradient-based solvers to be used in traditional machine learning.
arXiv Detail & Related papers (2023-01-27T08:13:00Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Task-Agnostic Robust Representation Learning [31.818269301504564]
We study the problem of robust representation learning with unlabeled data in a task-agnostic manner.
We derive an upper bound on the adversarial loss of a prediction model on any downstream task, using its loss on the clean data and a robustness regularizer.
Our method achieves preferable adversarial performance compared to relevant baselines.
arXiv Detail & Related papers (2022-03-15T02:05:11Z) - Generalized Real-World Super-Resolution through Adversarial Robustness [107.02188934602802]
We present Robust Super-Resolution, a method that leverages the generalization capability of adversarial attacks to tackle real-world SR.
Our novel framework poses a paradigm shift in the development of real-world SR methods.
By using a single robust model, we outperform state-of-the-art specialized methods on real-world benchmarks.
arXiv Detail & Related papers (2021-08-25T22:43:20Z) - Unsupervised Learning of Debiased Representations with Pseudo-Attributes [85.5691102676175]
We propose a simple but effective debiasing technique in an unsupervised manner.
We perform clustering on the feature embedding space and identify pseudoattributes by taking advantage of the clustering results.
We then employ a novel cluster-based reweighting scheme for learning debiased representation.
arXiv Detail & Related papers (2021-08-06T05:20:46Z) - Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution.
This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes.
Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z) - A Simple but Tough-to-Beat Data Augmentation Approach for Natural
Language Understanding and Generation [53.8171136907856]
We introduce a set of simple yet effective data augmentation strategies dubbed cutoff.
cutoff relies on sampling consistency and thus adds little computational overhead.
cutoff consistently outperforms adversarial training and achieves state-of-the-art results on the IWSLT2014 German-English dataset.
arXiv Detail & Related papers (2020-09-29T07:08:35Z) - Sample-based Regularization: A Transfer Learning Strategy Toward Better
Generalization [8.432864879027724]
Training a deep neural network with a small amount of data is a challenging problem.
One of the practical difficulties that we often face is to collect many samples.
By using the source model trained with a large-scale dataset, the target model can alleviate the overfitting originated from the lack of training data.
arXiv Detail & Related papers (2020-07-10T06:02:05Z) - Learning Diverse Representations for Fast Adaptation to Distribution
Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task.
We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z) - CurricularFace: Adaptive Curriculum Learning Loss for Deep Face
Recognition [79.92240030758575]
We propose a novel Adaptive Curriculum Learning loss (CurricularFace) that embeds the idea of curriculum learning into the loss function.
Our CurricularFace adaptively adjusts the relative importance of easy and hard samples during different training stages.
arXiv Detail & Related papers (2020-04-01T08:43:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.