Complexity Matters: Dynamics of Feature Learning in the Presence of Spurious Correlations
- URL: http://arxiv.org/abs/2403.03375v3
- Date: Sat, 24 Aug 2024 10:03:35 GMT
- Title: Complexity Matters: Dynamics of Feature Learning in the Presence of Spurious Correlations
- Authors: GuanWen Qiu, Da Kuang, Surbhi Goel,
- Abstract summary: We study the dynamics of feature learning under spurious correlations.
We demonstrate that our findings justify the success of retraining the last layer to remove spurious correlation.
We also identify limitations of popular debiasing algorithms that exploit early learning of spurious features.
- Score: 13.119576365743624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing research often posits spurious features as easier to learn than core features in neural network optimization, but the impact of their relative simplicity remains under-explored. Moreover, studies mainly focus on end performance rather than the learning dynamics of feature learning. In this paper, we propose a theoretical framework and an associated synthetic dataset grounded in boolean function analysis. This setup allows for fine-grained control over the relative complexity (compared to core features) and correlation strength (with respect to the label) of spurious features to study the dynamics of feature learning under spurious correlations. Our findings uncover several interesting phenomena: (1) stronger spurious correlations or simpler spurious features slow down the learning rate of the core features, (2) two distinct subnetworks are formed to learn core and spurious features separately, (3) learning phases of spurious and core features are not always separable, (4) spurious features are not forgotten even after core features are fully learned. We demonstrate that our findings justify the success of retraining the last layer to remove spurious correlation and also identifies limitations of popular debiasing algorithms that exploit early learning of spurious features. We support our empirical findings with theoretical analyses for the case of learning XOR features with a one-hidden-layer ReLU network.
Related papers
- Provably Neural Active Learning Succeeds via Prioritizing Perplexing Samples [53.95282502030541]
Neural Network-based active learning (NAL) is a cost-effective data selection technique that utilizes neural networks to select and train on a small subset of samples.
We try to move one step forward by offering a unified explanation for the success of both query criteria-based NAL from a feature learning view.
arXiv Detail & Related papers (2024-06-06T10:38:01Z) - Interactive Ontology Matching with Cost-Efficient Learning [2.006461411375746]
This work introduces DualLoop, an active learning method tailored to matching.
Compared to existing active learning methods, we consistently achieved better F1 scores and recall.
We report our operational performance results within the Architecture, Engineering, Construction (AEC) industry sector.
arXiv Detail & Related papers (2024-04-11T11:53:14Z) - Multi-modal Causal Structure Learning and Root Cause Analysis [67.67578590390907]
We propose Mulan, a unified multi-modal causal structure learning method for root cause localization.
We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data.
We also introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph.
arXiv Detail & Related papers (2024-02-04T05:50:38Z) - ResMatch: Residual Attention Learning for Local Feature Matching [51.07496081296863]
We rethink cross- and self-attention from the viewpoint of traditional feature matching and filtering.
We inject the similarity of descriptors and relative positions into cross- and self-attention score.
We mine intra- and inter-neighbors according to the similarity of descriptors and relative positions.
arXiv Detail & Related papers (2023-07-11T11:32:12Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - Identifying Spurious Biases Early in Training through the Lens of
Simplicity Bias [25.559684790787866]
We show that examples with spurious features are provably separable based on the model's output early in training.
We propose SPARE, which identifies spurious correlations early in training and utilizes importance sampling to alleviate their effect.
arXiv Detail & Related papers (2023-05-30T05:51:36Z) - How Spurious Features Are Memorized: Precise Analysis for Random and NTK Features [19.261178173399784]
We consider spurious features that are uncorrelated with the learning task.
We provide a precise characterization of how they are memorized via two separate terms.
We prove that the memorization of spurious features weakens as the generalization capability increases.
arXiv Detail & Related papers (2023-05-20T05:27:41Z) - Sample-Efficient Reinforcement Learning in the Presence of Exogenous
Information [77.19830787312743]
In real-world reinforcement learning applications the learner's observation space is ubiquitously high-dimensional with both relevant and irrelevant information about the task at hand.
We introduce a new problem setting for reinforcement learning, the Exogenous Decision Process (ExoMDP), in which the state space admits an (unknown) factorization into a small controllable component and a large irrelevant component.
We provide a new algorithm, ExoRL, which learns a near-optimal policy with sample complexity in the size of the endogenous component.
arXiv Detail & Related papers (2022-06-09T05:19:32Z) - Continual Feature Selection: Spurious Features in Continual Learning [0.0]
This paper studies spurious features' influence on continual learning algorithms.
We show that learning algorithms solve tasks by overfitting features that are not generalizable.
arXiv Detail & Related papers (2022-03-02T10:43:54Z) - Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error.
We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z) - Toward Understanding the Feature Learning Process of Self-supervised
Contrastive Learning [43.504548777955854]
We study how contrastive learning learns the feature representations for neural networks by analyzing its feature learning process.
We prove that contrastive learning using textbfReLU networks provably learns the desired sparse features if proper augmentations are adopted.
arXiv Detail & Related papers (2021-05-31T16:42:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.