On the Use of Anchoring for Training Vision Models
- URL: http://arxiv.org/abs/2406.00529v1
- Date: Sat, 1 Jun 2024 18:43:43 GMT
- Title: On the Use of Anchoring for Training Vision Models
- Authors: Vivek Narayanaswamy, Kowshik Thopalli, Rushil Anirudh, Yamen Mubarka, Wesam Sakla, Jayaraman J. Thiagarajan,
- Abstract summary: Anchoring is a principle for training deep neural networks that has been shown to significantly improve uncertainty estimation, calibration, and extrapolation capabilities.
We identify a critical problem in anchored training that can lead to an increased risk of learning undesirable shortcuts, thereby limiting its generalization capabilities.
We introduce a new anchored training protocol that employs a simple regularizer to mitigate this issue and significantly enhances generalization.
- Score: 23.157571220639177
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Anchoring is a recent, architecture-agnostic principle for training deep neural networks that has been shown to significantly improve uncertainty estimation, calibration, and extrapolation capabilities. In this paper, we systematically explore anchoring as a general protocol for training vision models, providing fundamental insights into its training and inference processes and their implications for generalization and safety. Despite its promise, we identify a critical problem in anchored training that can lead to an increased risk of learning undesirable shortcuts, thereby limiting its generalization capabilities. To address this, we introduce a new anchored training protocol that employs a simple regularizer to mitigate this issue and significantly enhances generalization. We empirically evaluate our proposed approach across datasets and architectures of varying scales and complexities, demonstrating substantial performance gains in generalization and safety metrics compared to the standard training protocol.
Related papers
- Fragility-aware Classification for Understanding Risk and Improving Generalization [6.926253982569273]
We introduce the Fragility Index (FI), a novel metric that evaluates classification performance from a risk-averse perspective.
We derive exact reformulations for cross-entropy loss, hinge-type loss, and Lipschitz loss, and extend the approach to deep learning models.
arXiv Detail & Related papers (2025-02-18T16:44:03Z) - Leveraging Reasoning with Guidelines to Elicit and Utilize Knowledge for Enhancing Safety Alignment [63.15719512614899]
Refusal Training (RT) struggles to generalize against various OOD jailbreaking attacks.
We observe significant improvements on generalization as N increases.
We propose training model to perform safety reasoning for each query.
arXiv Detail & Related papers (2025-02-06T13:01:44Z) - On the Generalization Ability of Unsupervised Pretraining [53.06175754026037]
Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization.
This paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase.
Our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.
arXiv Detail & Related papers (2024-03-11T16:23:42Z) - Conserve-Update-Revise to Cure Generalization and Robustness Trade-off
in Adversarial Training [21.163070161951868]
Adrial training improves the robustness of neural networks against adversarial attacks.
We show that selectively updating specific layers while preserving others can substantially enhance the network's learning capacity.
We propose CURE, a novel training framework that leverages a gradient prominence criterion to perform selective conservation, updating, and revision of weights.
arXiv Detail & Related papers (2024-01-26T15:33:39Z) - Towards Improving Robustness Against Common Corruptions using Mixture of
Class Specific Experts [10.27974860479791]
This paper introduces a novel paradigm known as the Mixture of Class-Specific Expert Architecture.
The proposed architecture aims to mitigate vulnerabilities associated with common neural network structures.
arXiv Detail & Related papers (2023-11-16T20:09:47Z) - Hierarchical Decomposition of Prompt-Based Continual Learning:
Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice.
HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics.
Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z) - Learning Expressive Priors for Generalization and Uncertainty Estimation
in Neural Networks [77.89179552509887]
We propose a novel prior learning method for advancing generalization and uncertainty estimation in deep neural networks.
The key idea is to exploit scalable and structured posteriors of neural networks as informative priors with generalization guarantees.
We exhaustively show the effectiveness of this method for uncertainty estimation and generalization.
arXiv Detail & Related papers (2023-07-15T09:24:33Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z) - Learning Diverse Representations for Fast Adaptation to Distribution
Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task.
We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.