Statistical Theory of Multi-stage Newton Iteration Algorithm for Online Continual Learning
- URL: http://arxiv.org/abs/2508.07419v1
- Date: Sun, 10 Aug 2025 16:32:52 GMT
- Title: Statistical Theory of Multi-stage Newton Iteration Algorithm for Online Continual Learning
- Authors: Xinjia Lu, Chuhan Wang, Qian Zhao, Lixing Zhu, Xuehu Zhu,
- Abstract summary: constrained storage capacity prevents complete retention of historical data, leading to catastrophic forgetting during sequential task training.<n>We propose a novel continual learning framework from a statistical perspective.<n>We develop a Multi-step Newton Iteration algorithm that significantly reduces computational costs in certain scenarios.
- Score: 8.523951376964076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We focus on the critical challenge of handling non-stationary data streams in online continual learning environments, where constrained storage capacity prevents complete retention of historical data, leading to catastrophic forgetting during sequential task training. To more effectively analyze and address the problem of catastrophic forgetting in continual learning, we propose a novel continual learning framework from a statistical perspective. Our approach incorporates random effects across all model parameters and allows the dimension of parameters to diverge to infinity, offering a general formulation for continual learning problems. To efficiently process streaming data, we develop a Multi-step Newton Iteration algorithm that significantly reduces computational costs in certain scenarios by alleviating the burden of matrix inversion. Theoretically, we derive the asymptotic normality of the estimator, enabling subsequent statistical inference. Comprehensive validation through synthetic data experiments and two real datasets analyses demonstrates the effectiveness of our proposed method.
Related papers
- Forget Less, Retain More: A Lightweight Regularizer for Rehearsal-Based Continual Learning [51.07663354001582]
Deep neural networks suffer from catastrophic forgetting, where performance on previous tasks degrades after training on a new task.<n>We present a novel approach to address this challenge, focusing on the intersection of memory-based methods and regularization approaches.<n>We formulate a regularization strategy, termed Information Maximization (IM) regularizer, for memory-based continual learning methods.
arXiv Detail & Related papers (2025-12-01T15:56:00Z) - Online Inference of Constrained Optimization: Primal-Dual Optimality and Sequential Quadratic Programming [55.848340925419286]
We study online statistical inference for the solutions of quadratic optimization problems with equality and inequality constraints.<n>We develop a sequential programming (SSQP) method to solve these problems, where the step direction is computed by sequentially performing an approximation of the objective and a linear approximation of the constraints.<n>We show that our method global almost moving-average convergence and exhibits local normality with an optimal primal-dual limiting matrix in the sense of Hjek and Le Cam.
arXiv Detail & Related papers (2025-11-27T06:16:17Z) - Cross-Learning from Scarce Data via Multi-Task Constrained Optimization [70.90607489166648]
This paper introduces a multi-task emphcross-learning framework to overcome data scarcity.<n>We formulate this joint estimation as a constrained optimization problem.<n>We show the efficiency of our cross-learning method in applications with real data including image classification and propagation of infectious diseases.
arXiv Detail & Related papers (2025-11-17T18:35:59Z) - Using Imperfect Synthetic Data in Downstream Inference Tasks [50.40949503799331]
We introduce a new estimator based on generalized method of moments.<n>We find that interactions between the moment residuals of synthetic data and those of real data can improve estimates of the target parameter.
arXiv Detail & Related papers (2025-08-08T18:32:52Z) - Global Convergence of Continual Learning on Non-IID Data [51.99584235667152]
We provide a general and comprehensive theoretical analysis for continual learning of regression models.<n>We establish the almost sure convergence results of continual learning under a general data condition for the first time.
arXiv Detail & Related papers (2025-03-24T10:06:07Z) - MIBP-Cert: Certified Training against Data Perturbations with Mixed-Integer Bilinear Programs [50.41998220099097]
Data errors, corruptions, and poisoning attacks during training pose a major threat to the reliability of modern AI systems.<n>We introduce MIBP-Cert, a novel certification method based on mixed-integer bilinear programming (MIBP)<n>By computing the set of parameters reachable through perturbed or manipulated data, we can predict all possible outcomes and guarantee robustness.
arXiv Detail & Related papers (2024-12-13T14:56:39Z) - Adaptive debiased SGD in high-dimensional GLMs with streaming data [4.704144189806667]
This paper introduces a novel approach to online inference in high-dimensional generalized linear models.<n>Our method operates in a single-pass mode, making it different from existing methods that require full dataset access or large-dimensional summary statistics storage.<n>The core of our methodological innovation lies in an adaptive descent algorithm tailored for dynamic objective functions, coupled with a novel online debiasing procedure.
arXiv Detail & Related papers (2024-05-28T15:36:48Z) - Online Tensor Inference [0.0]
Traditional offline learning, involving the storage and utilization of all data in each computational iteration, becomes impractical for high-dimensional tensor data.
Existing low-rank tensor methods lack the capability for statistical inference in an online fashion.
Our approach employs Gradient Descent (SGD) to enable efficient real-time data processing without extensive memory requirements.
arXiv Detail & Related papers (2023-12-28T16:37:48Z) - Large-Scale OD Matrix Estimation with A Deep Learning Method [70.78575952309023]
The proposed method integrates deep learning and numerical optimization algorithms to infer matrix structure and guide numerical optimization.
We conducted tests to demonstrate the good generalization performance of our method on a large-scale synthetic dataset.
arXiv Detail & Related papers (2023-10-09T14:30:06Z) - Byzantine-Resilient Federated Learning at Edge [20.742023657098525]
We present a Byzantine-resilient descent algorithm that can handle heavy-tailed data.
We also propose an algorithm that incorporates costs during the learning process.
arXiv Detail & Related papers (2023-03-18T15:14:16Z) - Smoothed Online Learning for Prediction in Piecewise Affine Systems [43.64498536409903]
This paper builds on the recently developed smoothed online learning framework.
It provides the first algorithms for prediction and simulation in piecewise affine systems.
arXiv Detail & Related papers (2023-01-26T15:54:14Z) - Pessimistic Q-Learning for Offline Reinforcement Learning: Towards
Optimal Sample Complexity [51.476337785345436]
We study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes.
A variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity.
arXiv Detail & Related papers (2022-02-28T15:39:36Z) - Online Bootstrap Inference For Policy Evaluation in Reinforcement
Learning [90.59143158534849]
The recent emergence of reinforcement learning has created a demand for robust statistical inference methods.
Existing methods for statistical inference in online learning are restricted to settings involving independently sampled observations.
The online bootstrap is a flexible and efficient approach for statistical inference in linear approximation algorithms, but its efficacy in settings involving Markov noise has yet to be explored.
arXiv Detail & Related papers (2021-08-08T18:26:35Z) - Counterfactual Learning of Stochastic Policies with Continuous Actions [42.903292639112536]
We introduce a modelling strategy based on a joint kernel embedding of contexts and actions.<n>We empirically show that the optimization aspect of counterfactual learning is important.<n>We propose an evaluation protocol for offline policies in real-world logged systems.
arXiv Detail & Related papers (2020-04-22T07:42:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.