Theoretical Analysis of Contrastive Learning under Imbalanced Data: From Training Dynamics to a Pruning Solution
- URL: http://arxiv.org/abs/2602.10357v1
- Date: Tue, 10 Feb 2026 23:06:12 GMT
- Title: Theoretical Analysis of Contrastive Learning under Imbalanced Data: From Training Dynamics to a Pruning Solution
- Authors: Haixu Liao, Yating Zhou, Songyang Zhang, Meng Wang, Shuai Zhang,
- Abstract summary: We develop a theoretical framework to analyze the training dynamics of contrastive learning with Transformer-based encoders under imbalanced data.<n>Our results reveal that neuron weights evolve through three distinct stages of training, with different dynamics for majority features, minority features, and noise.<n>Inspired by these neuron-level behaviors, we show that pruning restores performance degraded by imbalance and enhances feature separation, offering both conceptual insights and practical guidance.
- Score: 33.0181633510156
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contrastive learning has emerged as a powerful framework for learning generalizable representations, yet its theoretical understanding remains limited, particularly under imbalanced data distributions that are prevalent in real-world applications. Such an imbalance can degrade representation quality and induce biased model behavior, yet a rigorous characterization of these effects is lacking. In this work, we develop a theoretical framework to analyze the training dynamics of contrastive learning with Transformer-based encoders under imbalanced data. Our results reveal that neuron weights evolve through three distinct stages of training, with different dynamics for majority features, minority features, and noise. We further show that minority features reduce representational capacity, increase the need for more complex architectures, and hinder the separation of ground-truth features from noise. Inspired by these neuron-level behaviors, we show that pruning restores performance degraded by imbalance and enhances feature separation, offering both conceptual insights and practical guidance. Major theoretical findings are validated through numerical experiments.
Related papers
- Understanding the Role of Rehearsal Scale in Continual Learning under Varying Model Capacities [11.882528379148141]
We formulate rehearsal-based continual learning as a multidimensional effectiveness-driven iterative optimization problem.<n>We derive a closed-form analysis of adaptability, memorability, and generalization from the perspective of rehearsal scale.<n>We validate these insights through numerical simulations and extended analyses on deep neural networks across multiple real-world datasets.
arXiv Detail & Related papers (2026-02-24T11:29:12Z) - Why Loss Re-weighting Works If You Stop Early: Training Dynamics of Unconstrained Features [34.88156871518115]
We introduce a small-scale model (SSM) to transparently demonstrate and analyze this phenomenon.<n>On the one hand, the SSM reveals how vanilla empirical risk minimization preferentially learns to distinguish majority classes over minorities early in training.<n>In stark contrast, reweighting restores balanced learning dynamics, enabling the simultaneous learning of features associated with both majorities and minorities.
arXiv Detail & Related papers (2026-01-17T11:26:53Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Simplifying Neural Network Training Under Class Imbalance [77.39968702907817]
Real-world datasets are often highly class-imbalanced, which can adversely impact the performance of deep learning models.
The majority of research on training neural networks under class imbalance has focused on specialized loss functions, sampling techniques, or two-stage training procedures.
We demonstrate that simply tuning existing components of standard deep learning pipelines, such as the batch size, data augmentation, and label smoothing, can achieve state-of-the-art performance without any such specialized class imbalance methods.
arXiv Detail & Related papers (2023-12-05T05:52:44Z) - Loss Dynamics of Temporal Difference Reinforcement Learning [36.772501199987076]
We study the case learning curves for temporal difference learning of a value function with linear function approximators.
We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function.
arXiv Detail & Related papers (2023-07-10T18:17:50Z) - A Theoretical Analysis of the Learning Dynamics under Class Imbalance [0.10231119246773925]
We show that the learning curves for minority and majority classes follow sub-optimal trajectories when training with a gradient-based trajectory.
This slowdown is related to the imbalance ratio and can be traced back to a competition between the optimization of different classes.
We find that GD is not guaranteed to decrease the loss for each class but that this problem can be addressed by performing a per-class normalization of the gradient.
arXiv Detail & Related papers (2022-07-01T12:54:38Z) - Towards Balanced Learning for Instance Recognition [149.76724446376977]
We propose Libra R-CNN, a framework towards balanced learning for instance recognition.
It integrates IoU-balanced sampling, balanced feature pyramid, and objective re-weighting, respectively for reducing the imbalance at sample, feature, and objective level.
arXiv Detail & Related papers (2021-08-23T13:40:45Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Counterfactual Representation Learning with Balancing Weights [74.67296491574318]
Key to causal inference with observational data is achieving balance in predictive features associated with each treatment type.
Recent literature has explored representation learning to achieve this goal.
We develop an algorithm for flexible, scalable and accurate estimation of causal effects.
arXiv Detail & Related papers (2020-10-23T19:06:03Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.