Beyond Distribution Shift: Spurious Features Through the Lens of
Training Dynamics
- URL: http://arxiv.org/abs/2302.09344v2
- Date: Sat, 14 Oct 2023 15:47:09 GMT
- Title: Beyond Distribution Shift: Spurious Features Through the Lens of
Training Dynamics
- Authors: Nihal Murali, Aahlad Puli, Ke Yu, Rajesh Ranganath, Kayhan
Batmanghelich
- Abstract summary: Deep Neural Networks (DNNs) are prone to learning spurious features that correlate with the label during training but are irrelevant to the learning problem.
This paper aims to better understand the effects of spurious features through the lens of the learning dynamics of the internal neurons during the training process.
- Score: 31.16516225185384
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Neural Networks (DNNs) are prone to learning spurious features that
correlate with the label during training but are irrelevant to the learning
problem. This hurts model generalization and poses problems when deploying them
in safety-critical applications. This paper aims to better understand the
effects of spurious features through the lens of the learning dynamics of the
internal neurons during the training process. We make the following
observations: (1) While previous works highlight the harmful effects of
spurious features on the generalization ability of DNNs, we emphasize that not
all spurious features are harmful. Spurious features can be "benign" or
"harmful" depending on whether they are "harder" or "easier" to learn than the
core features for a given model. This definition is model and
dataset-dependent. (2) We build upon this premise and use instance difficulty
methods (like Prediction Depth (Baldock et al., 2021)) to quantify "easiness"
for a given model and to identify this behavior during the training phase. (3)
We empirically show that the harmful spurious features can be detected by
observing the learning dynamics of the DNN's early layers. In other words, easy
features learned by the initial layers of a DNN early during the training can
(potentially) hurt model generalization. We verify our claims on medical and
vision datasets, both simulated and real, and justify the empirical success of
our hypothesis by showing the theoretical connections between Prediction Depth
and information-theoretic concepts like V-usable information (Ethayarajh et
al., 2021). Lastly, our experiments show that monitoring only accuracy during
training (as is common in machine learning pipelines) is insufficient to detect
spurious features. We, therefore, highlight the need for monitoring early
training dynamics using suitable instance difficulty metrics.
Related papers
- Analyzing and Mitigating Object Hallucination: A Training Bias Perspective [108.09666587800781]
We introduce a new benchmark, POPEv2, which consists of counterfactual images collected from the training data of LVLMs with certain objects masked.<n>We find that current LVLMs suffer from training bias: they fail to fully leverage their training data and hallucinate more frequently on images seen during training.<n>We propose Obliviate, an efficient and lightweight unlearning method designed to mitigate object hallucination via training bias unlearning.
arXiv Detail & Related papers (2025-08-06T15:51:02Z) - What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - Early learning of the optimal constant solution in neural networks and humans [4.016584525313835]
We show that learning of a target function is preceded by an early phase in which networks learn the optimal constant solution (OCS)
We show that learning of the OCS can emerge even in the absence of bias terms and is equivalently driven by generic correlations in the input data.
Our work suggests the OCS as a universal learning principle in supervised, error-corrective learning.
arXiv Detail & Related papers (2024-06-25T11:12:52Z) - Assessing Neural Network Representations During Training Using
Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process.
We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures.
We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z) - Towards Causal Deep Learning for Vulnerability Detection [31.59558109518435]
We introduce do calculus based causal learning to software engineering models.
Our results show that CausalVul consistently improved the model accuracy, robustness and OOD performance.
arXiv Detail & Related papers (2023-10-12T00:51:06Z) - Adaptive Online Incremental Learning for Evolving Data Streams [4.3386084277869505]
The first major difficulty is concept drift, that is, the probability distribution in the streaming data would change as the data arrives.
The second major difficulty is catastrophic forgetting, that is, forgetting what we have learned before when learning new knowledge.
Our research builds on this observation and attempts to overcome these difficulties.
arXiv Detail & Related papers (2022-01-05T14:25:53Z) - Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error.
We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - When and how epochwise double descent happens [7.512375012141203]
An epochwise double descent' effect exists in which the generalization error initially drops, then rises, and finally drops again with increasing training time.
This presents a practical problem in that the amount of time required for training is long, and early stopping based on validation performance may result in suboptimal generalization.
We show that epochwise double descent requires a critical amount of noise to occur, but above a second critical noise level early stopping remains effective.
arXiv Detail & Related papers (2021-08-26T19:19:17Z) - Reasoning-Modulated Representations [85.08205744191078]
We study a common setting where our task is not purely opaque.
Our approach paves the way for a new class of data-efficient representation learning.
arXiv Detail & Related papers (2021-07-19T13:57:13Z) - The Causal Neural Connection: Expressiveness, Learnability, and
Inference [125.57815987218756]
An object called structural causal model (SCM) represents a collection of mechanisms and sources of random variation of the system under investigation.
In this paper, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020) still holds for neural models.
We introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences.
arXiv Detail & Related papers (2021-07-02T01:55:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.