Related papers: Scalable Analytic Classifiers with Associative Drift Compensation for Class-Incremental Learning of Vision Transformers

Scalable Analytic Classifiers with Associative Drift Compensation for Class-Incremental Learning of Vision Transformers

URL: http://arxiv.org/abs/2602.00144v1
Date: Thu, 29 Jan 2026 06:42:20 GMT
Title: Scalable Analytic Classifiers with Associative Drift Compensation for Class-Incremental Learning of Vision Transformers
Authors: Xuan Rao, Mingming Ha, Bo Zhao, Derong Liu, Cesare Alippi,
Abstract summary: Class-incremental learning with Vision Transformers (ViTs) faces a major computational bottleneck during the reconstruction phase.<n>Regularized Gaussian Discriminant Analysis (RGDA) provides a Bayes-optimal alternative with accuracy comparable to SGD-based classifiers.<n>We propose Low-Rank Factorized RGDA (LR-RGDA), a scalable classifier that combines RGDA's expressivity with the efficiency of linear classifiers.
Score: 26.771319566121708
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Class-incremental learning (CIL) with Vision Transformers (ViTs) faces a major computational bottleneck during the classifier reconstruction phase, where most existing methods rely on costly iterative stochastic gradient descent (SGD). We observe that analytic Regularized Gaussian Discriminant Analysis (RGDA) provides a Bayes-optimal alternative with accuracy comparable to SGD-based classifiers; however, its quadratic inference complexity limits its use in large-scale CIL scenarios. To overcome this, we propose Low-Rank Factorized RGDA (LR-RGDA), a scalable classifier that combines RGDA's expressivity with the efficiency of linear classifiers. By exploiting the low-rank structure of the covariance via the Woodbury matrix identity, LR-RGDA decomposes the discriminant function into a global affine term refined by a low-rank quadratic perturbation, reducing the inference complexity from $\mathcal{O}(Cd^2)$ to $\mathcal{O}(d^2 + Crd^2)$, where $C$ is the class number, $d$ the feature dimension, and $r \ll d$ the subspace rank. To mitigate representation drift caused by backbone updates, we further introduce Hopfield-based Distribution Compensator (HopDC), a training-free mechanism that uses modern continuous Hopfield Networks to recalibrate historical class statistics through associative memory dynamics on unlabeled anchors, accompanied by a theoretical bound on the estimation error. Extensive experiments on diverse CIL benchmarks demonstrate that our framework achieves state-of-the-art performance, providing a scalable solution for large-scale class-incremental learning with ViTs. Code: https://github.com/raoxuan98-hash/lr_rgda_hopdc.

Related papers

Binary Kernel Logistic Regression: a sparsity-inducing formulation and a convergent decomposition training algorithm [0.5680416078423551]
Kernel logistic regression (KLR) is a widely used supervised learning method for binary and multi-class classification.<n>Previous attempts to deal with sparsity in KLR include a method referred to as the Vector Import Machine (IVM)<n>We propose an extension of the training formulation proposed by Keerthi et al., which is able to induce sparsity in the trained model.
arXiv Detail & Related papers (2025-12-22T14:40:30Z)
Evolution Strategies at the Hyperscale [57.75314521465674]
We introduce EGGROLL, an evolution strategies (ES) algorithm designed to scale backprop-free optimization to large population sizes.<n>ES is a set of powerful blackbox optimisation methods that can handle non-differentiable or noisy objectives.<n>EGGROLL overcomes these bottlenecks by generating random matrices $Ain mathbbRmtimes r, Bin mathbbRntimes r$ with $rll min(m,n)$ to form a low-rank matrix perturbation $A Btop$
arXiv Detail & Related papers (2025-11-20T18:56:05Z)
Scaling Up ROC-Optimizing Support Vector Machines [3.1941554288428198]
The ROC-SVM directly maximizes the area under the ROC curve (AUC) and has become an attractive alternative of the conventional binary classification under the presence of class imbalance.<n>We develop a scalable variant of the ROC-SVM that leverages incomplete U-statistics, thereby substantially reducing computational complexity.<n>We extend the framework to nonlinear classification through a low-rank kernel approximation, enabling efficient training in reproducing kernel Hilbert spaces.
arXiv Detail & Related papers (2025-11-07T04:38:25Z)
Beyond Softmax: A Natural Parameterization for Categorical Random Variables [61.709831225296305]
We introduce the $textitcatnat$ function, a function composed of a sequence of hierarchical binary splits.<n>A rich set of experiments show that the proposed function improves the learning efficiency and yields models characterized by consistently higher test performance.
arXiv Detail & Related papers (2025-09-29T12:55:50Z)
Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture.<n>Non-smooth regularization is often incorporated into machine learning tasks.<n>We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
Logarithmic Regret for Online KL-Regularized Reinforcement Learning [51.113248212150964]
KL-regularization plays a pivotal role in improving efficiency of RL fine-tuning for large language models.<n>Despite its empirical advantage, the theoretical difference between KL-regularized RL and standard RL remains largely under-explored.<n>We propose an optimistic-based KL-regularized online contextual bandit algorithm, and provide a novel analysis of its regret.
arXiv Detail & Related papers (2025-02-11T11:11:05Z)
A Coefficient Makes SVRG Effective [51.36251650664215]
Variance Reduced Gradient (SVRG) is a theoretically compelling optimization method.<n>In this work, we demonstrate the potential of SVRG in optimizing real-world neural networks.
arXiv Detail & Related papers (2023-11-09T18:47:44Z)
CoLiDE: Concomitant Linear DAG Estimation [12.415463205960156]
We deal with the problem of learning acyclic graph structure from observational data to a linear equation. We propose a new convex score function for sparsity-aware learning DAGs.
arXiv Detail & Related papers (2023-10-04T15:32:27Z)
Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data. We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z)
Rényi Divergence Deep Mutual Learning [3.682680183777648]
This paper revisits Deep Learning Mutual (DML) as a simple yet effective computing paradigm. We propose using R'enyi divergence instead of the KL divergence, which is more flexible and limited. Our empirical results demonstrate the advantage combining DML and R'enyi divergence, leading to further improvement in model generalization.
arXiv Detail & Related papers (2022-09-13T04:58:35Z)
IB-GAN: A Unified Approach for Multivariate Time Series Classification under Class Imbalance [1.854931308524932]
Non-parametric data augmentation with Generative Adversarial Networks (GANs) offers a promising solution. We propose Imputation Balanced GAN (IB-GAN), a novel method that joins data augmentation and classification in a one-step process via an imputation-balancing approach.
arXiv Detail & Related papers (2021-10-14T15:31:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.