Temporal Imbalance of Positive and Negative Supervision in Class-Incremental Learning
- URL: http://arxiv.org/abs/2603.02280v1
- Date: Mon, 02 Mar 2026 01:57:52 GMT
- Title: Temporal Imbalance of Positive and Negative Supervision in Class-Incremental Learning
- Authors: Jinge Ma, Fengqing Zhu,
- Abstract summary: CIL faces the core challenge of catastrophic forgetting, often manifested as a prediction bias toward new classes.<n>Existing methods mainly attribute this bias to intra-task class imbalance and focus on corrections at the classifier head.<n>We propose Temporal-Adjusted Loss (TAL), which uses a temporal decay kernel to construct a supervision strength vector and dynamically reweight the negative supervision in cross-entropy loss.
- Score: 10.054396813990481
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the widespread adoption of deep learning in visual tasks, Class-Incremental Learning (CIL) has become an important paradigm for handling dynamically evolving data distributions. However, CIL faces the core challenge of catastrophic forgetting, often manifested as a prediction bias toward new classes. Existing methods mainly attribute this bias to intra-task class imbalance and focus on corrections at the classifier head. In this paper, we highlight an overlooked factor -- temporal imbalance -- as a key cause of this bias. Earlier classes receive stronger negative supervision toward the end of training, leading to asymmetric precision and recall. We establish a temporal supervision model, formally define temporal imbalance, and propose Temporal-Adjusted Loss (TAL), which uses a temporal decay kernel to construct a supervision strength vector and dynamically reweight the negative supervision in cross-entropy loss. Theoretical analysis shows that TAL degenerates to standard cross-entropy under balanced conditions and effectively mitigates prediction bias under imbalance. Extensive experiments demonstrate that TAL significantly reduces forgetting and improves performance on multiple CIL benchmarks, underscoring the importance of temporal modeling for stable long-term learning.
Related papers
- Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models [66.36240676392502]
Chain-of-thought (CoT) reasoning has become the standard paradigm for enabling Large Language Models (LLMs) to solve complex problems.<n>Recent studies reveal a sharp performance drop in reasoning hop generalization scenarios.<n>We propose test-time correction of reasoning, a lightweight intervention method that dynamically identifies and deactivates ep heads in the reasoning process.
arXiv Detail & Related papers (2026-01-29T03:24:32Z) - Class Confidence Aware Reweighting for Long Tailed Learning [0.8297806372438926]
We present the design of a class and confidence-aware re-weighting scheme for long-tailed learning.<n>We use an (p_t, f_c) function to enable the modulation of the contribution towards the training task based upon the confidence value of the prediction.
arXiv Detail & Related papers (2026-01-22T12:58:05Z) - Training Instabilities Induce Flatness Bias in Gradient Descent [6.628332915214955]
Modern deep networks often achieve their best performance beyond a stability threshold.<n>We show that training instabilities induce an implicit bias in GD, driving parameters toward flatter regions of the loss landscape.<n>We also show that restoring instabilities in Adam further improves generalization.
arXiv Detail & Related papers (2025-11-16T11:26:25Z) - Abstain Mask Retain Core: Time Series Prediction by Adaptive Masking Loss with Representation Consistency [4.047219770183742]
Time series forecasting plays a pivotal role in critical domains such as energy management and financial markets.<n>This study reveals a counterintuitive phenomenon: appropriately truncating historical data can enhance prediction accuracy.<n>We propose an innovative solution termed Adaptive Masking Loss with Representation Consistency.
arXiv Detail & Related papers (2025-10-22T19:23:53Z) - Characteristic Root Analysis and Regularization for Linear Time Series Forecasting [9.254995889539716]
Time series forecasting remains a critical challenge across numerous domains.<n>Recent studies highlight the surprising competitiveness of simple linear models.<n>This paper focuses on the role of characteristic roots in temporal dynamics.
arXiv Detail & Related papers (2025-09-28T03:06:30Z) - Counterfactual Reward Model Training for Bias Mitigation in Multimodal Reinforcement Learning [0.5204229323525671]
We present a counterfactual reward model that introduces causal inference with multimodal representation learning to provide an unsupervised, bias-resilient reward signal.<n>We evaluated the framework on a multimodal fake versus true news dataset, which exhibits framing bias, class imbalance, and distributional drift.<n>The resulting system achieved an accuracy of 89.12% in fake news detection, outperforming the baseline reward models.
arXiv Detail & Related papers (2025-08-27T04:54:33Z) - On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling [15.769249369390884]
We show that optimal learning rates decay slower than theoretically predicted and networks exhibit both stable training and non-trivial feature learning, even at very large widths.<n>In particular, we show that under cross-entropy (CE) loss, the unstable regime comprises two distinct sub-regimes: a catastrophically unstable regime and a more benign controlled divergence regime.<n>Our empirical evidence suggests that width-scaling considerations are surprisingly useful for predicting empirically maximal stable learning rate exponents.
arXiv Detail & Related papers (2025-05-28T15:40:48Z) - Selective Learning: Towards Robust Calibration with Dynamic Regularization [79.92633587914659]
Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance.
We introduce Dynamic Regularization (DReg) which aims to learn what should be learned during training thereby circumventing the confidence adjusting trade-off.
arXiv Detail & Related papers (2024-02-13T11:25:20Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Uncertainty-guided Boundary Learning for Imbalanced Social Event
Detection [64.4350027428928]
We propose a novel uncertainty-guided class imbalance learning framework for imbalanced social event detection tasks.
Our model significantly improves social event representation and classification tasks in almost all classes, especially those uncertain ones.
arXiv Detail & Related papers (2023-10-30T03:32:04Z) - Class-Imbalanced Graph Learning without Class Rebalancing [62.1368829847041]
Class imbalance is prevalent in real-world node classification tasks and poses great challenges for graph learning models.
In this work, we approach the root cause of class-imbalance bias from an topological paradigm.
We devise a lightweight topological augmentation framework BAT to mitigate the class-imbalance bias without class rebalancing.
arXiv Detail & Related papers (2023-08-27T19:01:29Z) - Predicting and Enhancing the Fairness of DNNs with the Curvature of Perceptual Manifolds [44.79535333220044]
Recent studies have shown that tail classes are not always hard to learn, and model bias has been observed on sample-balanced datasets.<n>In this work, we first establish a geometric perspective for analyzing model fairness and then systematically propose a series of geometric measurements.
arXiv Detail & Related papers (2023-03-22T04:49:23Z) - Stochastically forced ensemble dynamic mode decomposition for
forecasting and analysis of near-periodic systems [65.44033635330604]
We introduce a novel load forecasting method in which observed dynamics are modeled as a forced linear system.
We show that its use of intrinsic linear dynamics offers a number of desirable properties in terms of interpretability and parsimony.
Results are presented for a test case using load data from an electrical grid.
arXiv Detail & Related papers (2020-10-08T20:25:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.