A Flat Minima Perspective on Understanding Augmentations and Model Robustness
- URL: http://arxiv.org/abs/2505.24592v1
- Date: Fri, 30 May 2025 13:40:44 GMT
- Title: A Flat Minima Perspective on Understanding Augmentations and Model Robustness
- Authors: Weebum Yoo, Sung Whan Yoon,
- Abstract summary: We offer a unified theoretical framework to clarify how augmentations can enhance model robustness.<n>Our work diverges from prior studies in that our analysis broadly encompasses much of the existing augmentation methods.<n>We confirm our theories through simulations on the existing common corruption and adversarial robustness benchmarks.
- Score: 4.297070083645049
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model robustness indicates a model's capability to generalize well on unforeseen distributional shifts, including data corruption, adversarial attacks, and domain shifts. Data augmentation is one of the prevalent and effective ways to enhance robustness. Despite the great success of augmentations in different fields, a general theoretical understanding of their efficacy in improving model robustness is lacking. We offer a unified theoretical framework to clarify how augmentations can enhance model robustness through the lens of loss surface flatness and PAC generalization bound. Our work diverges from prior studies in that our analysis i) broadly encompasses much of the existing augmentation methods, and ii) is not limited to specific types of distribution shifts like adversarial attacks. We confirm our theories through simulations on the existing common corruption and adversarial robustness benchmarks based on the CIFAR and ImageNet datasets, as well as domain generalization benchmarks including PACS and OfficeHome.
Related papers
- Learning Time-Aware Causal Representation for Model Generalization in Evolving Domains [50.66049136093248]
We develop a time-aware structural causal model (SCM) that incorporates dynamic causal factors and the causal mechanism drifts.<n>We show that our method can yield the optimal causal predictor for each time domain.<n>Results on both synthetic and real-world datasets exhibit that SYNC can achieve superior temporal generalization performance.
arXiv Detail & Related papers (2025-06-21T14:05:37Z) - On Weak-to-Strong Generalization and f-Divergence [23.062111583403095]
Weak-to-strong generalization (W2SG) has emerged as a promising paradigm for stimulating the capabilities of strong pre-trained models.<n>We introduce $f$-divergence as an information-theoretic loss function framework in W2SG.<n>We empirically demonstrate that $f$-divergence loss, which generalizes widely-used metrics like KL divergence, effectively improves generalization and noise tolerance of the strong model in practice.
arXiv Detail & Related papers (2025-06-03T17:40:08Z) - PEER pressure: Model-to-Model Regularization for Single Source Domain Generalization [12.15086255236961]
We show that the performance of such augmentation-based methods in the target domains universally fluctuates during training.<n>We propose a novel generalization method, coined.<n>Space Ensemble with Entropy Regularization (PEER), that uses a proxy model to learn the augmented data.
arXiv Detail & Related papers (2025-05-19T06:01:11Z) - Seeking Flat Minima over Diverse Surrogates for Improved Adversarial Transferability: A Theoretical Framework and Algorithmic Instantiation [38.12499933796839]
We propose a novel transferability bound that offers provable guarantees for adversarial transferability.<n>Our theoretical results demonstrate that optimizing AEs toward flat minima over the surrogate model set, while controlling the surrogate-target model shift measured by the adversarial model discrepancy, yields a comprehensive guarantee for AE transferability.
arXiv Detail & Related papers (2025-04-23T07:33:45Z) - Transferable Adversarial Attacks on SAM and Its Downstream Models [87.23908485521439]
This paper explores the feasibility of adversarial attacking various downstream models fine-tuned from the segment anything model (SAM)<n>To enhance the effectiveness of the adversarial attack towards models fine-tuned on unknown datasets, we propose a universal meta-initialization (UMI) algorithm.
arXiv Detail & Related papers (2024-10-26T15:04:04Z) - Understanding Adversarially Robust Generalization via Weight-Curvature Index [3.096869664709865]
We propose a novel perspective to decipher adversarially robust generalization through the lens of the Weight-Curvature Index (WCI)
The proposed WCI quantifies the vulnerability of models to adversarial perturbations using the Frobenius norm of weight matrices and the trace of Hessian matrices.
Our work provides crucial insights for designing more resilient deep learning models, enhancing their reliability and security.
arXiv Detail & Related papers (2024-10-10T08:34:43Z) - Revisiting the Robust Generalization of Adversarial Prompt Tuning [4.033827046965844]
We propose an adaptive Consistency-guided Adrial Prompt Tuning (i.e., CAPT) framework to enhance the alignment of image and text features for adversarial examples.
We conduct experiments across 14 datasets and 4 data sparsity schemes to show the superiority of CAPT over other state-of-the-art adaption methods.
arXiv Detail & Related papers (2024-05-18T02:54:41Z) - The Risk of Federated Learning to Skew Fine-Tuning Features and
Underperform Out-of-Distribution Robustness [50.52507648690234]
Federated learning has the risk of skewing fine-tuning features and compromising the robustness of the model.
We introduce three robustness indicators and conduct experiments across diverse robust datasets.
Our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-01-25T09:18:51Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Learn from the Past: A Proxy Guided Adversarial Defense Framework with
Self Distillation Regularization [53.04697800214848]
Adversarial Training (AT) is pivotal in fortifying the robustness of deep learning models.
AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting.
We present a general proxy guided defense framework, LAST' (bf Learn from the Pbf ast)
arXiv Detail & Related papers (2023-10-19T13:13:41Z) - Boosted Control Functions: Distribution generalization and invariance in confounded models [10.503777692702952]
We introduce a strong notion of invariance that allows for distribution generalization even in the presence of nonlinear, non-identifiable structural functions.<n>We propose the ControlTwicing algorithm to estimate the Boosted Control Function (BCF) using flexible machine-learning techniques.
arXiv Detail & Related papers (2023-10-09T15:43:46Z) - Generalizability of Adversarial Robustness Under Distribution Shifts [57.767152566761304]
We take a first step towards investigating the interplay between empirical and certified adversarial robustness on one hand and domain generalization on another.
We train robust models on multiple domains and evaluate their accuracy and robustness on an unseen domain.
We extend our study to cover a real-world medical application, in which adversarial augmentation significantly boosts the generalization of robustness with minimal effect on clean data accuracy.
arXiv Detail & Related papers (2022-09-29T18:25:48Z) - Posterior Differential Regularization with f-divergence for Improving
Model Robustness [95.05725916287376]
We focus on methods that regularize the model posterior difference between clean and noisy inputs.
We generalize the posterior differential regularization to the family of $f$-divergences.
Our experiments show that regularizing the posterior differential with $f$-divergence can result in well-improved model robustness.
arXiv Detail & Related papers (2020-10-23T19:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.