Related papers: Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility

Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility

URL: http://arxiv.org/abs/2507.17748v2
Date: Tue, 05 Aug 2025 15:46:33 GMT
Title: Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility
Authors: Melih Barsbey, Lucas Prieto, Stefanos Zafeiriou, Tolga Birdal,
Abstract summary: We identify high learning rates as a facilitator for simultaneously achieving robustness to spurious correlations and network compressibility.<n>Large learning rates produce desirable representation properties such as invariant feature utilization, class separation, and activation sparsity.<n>Our investigation of the mechanisms underlying this phenomenon reveals the importance of confident mispredictions of bias-conflicting samples under large learning rates.
Score: 46.171357375793235
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Robustness and resource-efficiency are two highly desirable properties for modern machine learning models. However, achieving them jointly remains a challenge. In this paper, we identify high learning rates as a facilitator for simultaneously achieving robustness to spurious correlations and network compressibility. We demonstrate that large learning rates also produce desirable representation properties such as invariant feature utilization, class separation, and activation sparsity. Our findings indicate that large learning rates compare favorably to other hyperparameters and regularization methods, in consistently satisfying these properties in tandem. In addition to demonstrating the positive effect of large learning rates across diverse spurious correlation datasets, models, and optimizers, we also present strong evidence that the previously documented success of large learning rates in standard classification tasks is related to addressing hidden/rare spurious correlations in the training dataset. Our investigation of the mechanisms underlying this phenomenon reveals the importance of confident mispredictions of bias-conflicting samples under large learning rates.

Related papers

Fairness and Robustness in Machine Unlearning [20.758637391023345]
We focus on fairness and robustness in machine unlearning algorithms.<n>Experiments demonstrate the vulnerability of current state-of-the-art approximated unlearning algorithms to adversarial attacks.<n>We demonstrate that unlearning in the intermediate and last layers is sufficient and cost-effective for time and memory complexity.
arXiv Detail & Related papers (2025-04-18T10:31:44Z)
Learning from Neighbors: Category Extrapolation for Long-Tail Learning [62.30734737735273]
We offer a novel perspective on long-tail learning, inspired by an observation: datasets with finer granularity tend to be less affected by data imbalance.<n>We introduce open-set auxiliary classes that are visually similar to existing ones, aiming to enhance representation learning for both head and tail classes.<n>To prevent the overwhelming presence of auxiliary classes from disrupting training, we introduce a neighbor-silencing loss.
arXiv Detail & Related papers (2024-10-21T13:06:21Z)
Explainability of Highly Associated Fuzzy Churn Patterns in Binary Classification [21.38368444137596]
This study emphasizes the importance of identifying multivariate patterns and setting soft bounds for intuitive interpretation. The main objective is to use a machine learning model and fuzzy-set theory with top-textitk HUIM to identify highly associated patterns of customer churn. As a result, the study introduces an innovative approach that improves the explainability and effectiveness of customer churn prediction models.
arXiv Detail & Related papers (2024-10-21T09:44:37Z)
Robust Learning with Progressive Data Expansion Against Spurious Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features. Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process. We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z)
Sharpness & Shift-Aware Self-Supervised Learning [17.978849280772092]
Self-supervised learning aims to extract meaningful features from unlabeled data for further downstream tasks. We develop rigorous theories to realize the factors that implicitly influence the general loss of this classification task. We conduct extensive experiments to verify our theoretical findings and demonstrate that sharpness & shift-aware contrastive learning can remarkably boost the performance.
arXiv Detail & Related papers (2023-05-17T14:42:16Z)
Implicit Counterfactual Data Augmentation for Robust Learning [24.795542869249154]
This study proposes an Implicit Counterfactual Data Augmentation method to remove spurious correlations and make stable predictions.<n>Experiments have been conducted across various biased learning scenarios covering both image and text datasets.
arXiv Detail & Related papers (2023-04-26T10:36:40Z)
Explicit Tradeoffs between Adversarial and Natural Distributional Robustness [48.44639585732391]
In practice, models need to enjoy both types of robustness to ensure reliability. In this work, we show that in fact, explicit tradeoffs exist between adversarial and natural distributional robustness.
arXiv Detail & Related papers (2022-09-15T19:58:01Z)
Generalizable Information Theoretic Causal Representation [37.54158138447033]
We propose to learn causal representation from observational data by regularizing the learning procedure with mutual information measures according to our hypothetical causal graph. The optimization involves a counterfactual loss, based on which we deduce a theoretical guarantee that the causality-inspired learning is with reduced sample complexity and better generalization ability.
arXiv Detail & Related papers (2022-02-17T00:38:35Z)
Adaptive Hierarchical Similarity Metric Learning with Noisy Labels [138.41576366096137]
We propose an Adaptive Hierarchical Similarity Metric Learning method. It considers two noise-insensitive information, textiti.e., class-wise divergence and sample-wise consistency. Our method achieves state-of-the-art performance compared with current deep metric learning approaches.
arXiv Detail & Related papers (2021-10-29T02:12:18Z)
Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering [71.15403434929915]
We show that across 5 models and 4 datasets on the task of visual question answering, a wide variety of active learning approaches fail to outperform random selection. We identify the problem as collective outliers -- groups of examples that active learning methods prefer to acquire but models fail to learn. We show that active learning sample efficiency increases significantly as the number of collective outliers in the active learning pool decreases.
arXiv Detail & Related papers (2021-07-06T00:52:11Z)
The Evolution of Out-of-Distribution Robustness Throughout Fine-Tuning [25.85044477227461]
Models that are more accurate on the out-of-distribution data relative to this baseline exhibit "effective robustness" We find that models pre-trained on larger datasets exhibit effective robustness during training that vanishes at convergence. We discuss several strategies for scaling effective robustness to the high-accuracy regime to improve the out-of-distribution accuracy of state-of-the-art models.
arXiv Detail & Related papers (2021-06-30T06:21:42Z)
Accurate and Robust Feature Importance Estimation under Distribution Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method. We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z)
Precise Tradeoffs in Adversarial Training for Linear Regression [55.764306209771405]
We provide a precise and comprehensive understanding of the role of adversarial training in the context of linear regression with Gaussian features. We precisely characterize the standard/robust accuracy and the corresponding tradeoff achieved by a contemporary mini-max adversarial training approach. Our theory for adversarial training algorithms also facilitates the rigorous study of how a variety of factors (size and quality of training data, model overparametrization etc.) affect the tradeoff between these two competing accuracies.
arXiv Detail & Related papers (2020-02-24T19:01:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.