Related papers: Beyond Distribution Shift: Spurious Features Through the Lens of Training Dynamics

Beyond Distribution Shift: Spurious Features Through the Lens of Training Dynamics

URL: http://arxiv.org/abs/2302.09344v2
Date: Sat, 14 Oct 2023 15:47:09 GMT
Title: Beyond Distribution Shift: Spurious Features Through the Lens of Training Dynamics
Authors: Nihal Murali, Aahlad Puli, Ke Yu, Rajesh Ranganath, Kayhan Batmanghelich
Abstract summary: Deep Neural Networks (DNNs) are prone to learning spurious features that correlate with the label during training but are irrelevant to the learning problem. This paper aims to better understand the effects of spurious features through the lens of the learning dynamics of the internal neurons during the training process.
Score: 31.16516225185384
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep Neural Networks (DNNs) are prone to learning spurious features that correlate with the label during training but are irrelevant to the learning problem. This hurts model generalization and poses problems when deploying them in safety-critical applications. This paper aims to better understand the effects of spurious features through the lens of the learning dynamics of the internal neurons during the training process. We make the following observations: (1) While previous works highlight the harmful effects of spurious features on the generalization ability of DNNs, we emphasize that not all spurious features are harmful. Spurious features can be "benign" or "harmful" depending on whether they are "harder" or "easier" to learn than the core features for a given model. This definition is model and dataset-dependent. (2) We build upon this premise and use instance difficulty methods (like Prediction Depth (Baldock et al., 2021)) to quantify "easiness" for a given model and to identify this behavior during the training phase. (3) We empirically show that the harmful spurious features can be detected by observing the learning dynamics of the DNN's early layers. In other words, easy features learned by the initial layers of a DNN early during the training can (potentially) hurt model generalization. We verify our claims on medical and vision datasets, both simulated and real, and justify the empirical success of our hypothesis by showing the theoretical connections between Prediction Depth and information-theoretic concepts like V-usable information (Ethayarajh et al., 2021). Lastly, our experiments show that monitoring only accuracy during training (as is common in machine learning pipelines) is insufficient to detect spurious features. We, therefore, highlight the need for monitoring early training dynamics using suitable instance difficulty metrics.

Related papers

Early learning of the optimal constant solution in neural networks and humans [4.016584525313835]
We show that learning of a target function is preceded by an early phase in which networks learn the optimal constant solution (OCS) We show that learning of the OCS can emerge even in the absence of bias terms and is equivalently driven by generic correlations in the input data. Our work suggests the OCS as a universal learning principle in supervised, error-corrective learning.
arXiv Detail & Related papers (2024-06-25T11:12:52Z)
lpNTK: Better Generalisation with Less Data via Sample Interaction During Learning [22.59771349030541]
We propose a pseudo Neural Tangent Kernel (lpNTK) to take label information into consideration when measuring the interactions between samples. lpNTK helps to understand learning phenomena identified in previous work, specifically the learning difficulty of samples and forgetting events during learning. We show that using lpNTK to identify and remove poisoning training samples does not hurt the generalisation performance of ANNs.
arXiv Detail & Related papers (2024-01-16T20:20:10Z)
Assessing Neural Network Representations During Training Using Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process. We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures. We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z)
Towards Causal Deep Learning for Vulnerability Detection [31.59558109518435]
We introduce do calculus based causal learning to software engineering models. Our results show that CausalVul consistently improved the model accuracy, robustness and OOD performance.
arXiv Detail & Related papers (2023-10-12T00:51:06Z)
Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features. We find new and interesting properties that do not exist in single-task linear regression. Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z)
Adaptive Online Incremental Learning for Evolving Data Streams [4.3386084277869505]
The first major difficulty is concept drift, that is, the probability distribution in the streaming data would change as the data arrives. The second major difficulty is catastrophic forgetting, that is, forgetting what we have learned before when learning new knowledge. Our research builds on this observation and attempts to overcome these difficulties.
arXiv Detail & Related papers (2022-01-05T14:25:53Z)
Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error. We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z)
Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning. Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z)
When and how epochwise double descent happens [7.512375012141203]
An epochwise double descent' effect exists in which the generalization error initially drops, then rises, and finally drops again with increasing training time. This presents a practical problem in that the amount of time required for training is long, and early stopping based on validation performance may result in suboptimal generalization. We show that epochwise double descent requires a critical amount of noise to occur, but above a second critical noise level early stopping remains effective.
arXiv Detail & Related papers (2021-08-26T19:19:17Z)
Reasoning-Modulated Representations [85.08205744191078]
We study a common setting where our task is not purely opaque. Our approach paves the way for a new class of data-efficient representation learning.
arXiv Detail & Related papers (2021-07-19T13:57:13Z)
The Causal Neural Connection: Expressiveness, Learnability, and Inference [125.57815987218756]
An object called structural causal model (SCM) represents a collection of mechanisms and sources of random variation of the system under investigation. In this paper, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020) still holds for neural models. We introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences.
arXiv Detail & Related papers (2021-07-02T01:55:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.