One Rank at a Time: Cascading Error Dynamics in Sequential Learning
- URL: http://arxiv.org/abs/2505.22602v1
- Date: Wed, 28 May 2025 17:16:24 GMT
- Title: One Rank at a Time: Cascading Error Dynamics in Sequential Learning
- Authors: Mahtab Alizadeh Vandchali, Fangshuo, Liao, Anastasios Kyrillidis,
- Abstract summary: We show how errors propagate when learning rank-1 subspaces sequentially.<n>Our contribution is a characterization of the error propagation in this sequential process.<n>We prove that these errors compound in predictable ways, with implications for both algorithmic design and stability guarantees.
- Score: 8.61384097894607
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sequential learning -- where complex tasks are broken down into simpler, hierarchical components -- has emerged as a paradigm in AI. This paper views sequential learning through the lens of low-rank linear regression, focusing specifically on how errors propagate when learning rank-1 subspaces sequentially. We present an analysis framework that decomposes the learning process into a series of rank-1 estimation problems, where each subsequent estimation depends on the accuracy of previous steps. Our contribution is a characterization of the error propagation in this sequential process, establishing bounds on how errors -- e.g., due to limited computational budgets and finite precision -- affect the overall model accuracy. We prove that these errors compound in predictable ways, with implications for both algorithmic design and stability guarantees.
Related papers
- Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments [5.5855749614100825]
This paper addresses the hypothesis that leveraging multiple pre-trained models can mitigate this recall reduction.<n>We formulate the challenge of identifying and managing conflicting predictions from various models as a consistency-based abduction problem.<n>Our results validate the use of consistency-based abduction as an effective mechanism to robustly integrate knowledge from multiple imperfect reasoners in challenging, novel scenarios.
arXiv Detail & Related papers (2025-05-25T23:17:47Z) - Analysis of Overparameterization in Continual Learning under a Linear Model [5.5165579223151795]
We study continual learning and catastrophic forgetting from a theoretical perspective in the simple setting of gradient descent.<n>We analytically demonstrate that over parameterization alone can mitigate forgetting in the context of a linear regression model.<n>As part of this work, we establish a non-asymptotic bound of the risk of a single linear regression task, which may be of independent interest to the field of double descent theory.
arXiv Detail & Related papers (2025-02-11T00:15:38Z) - Rethinking Early Stopping: Refine, Then Calibrate [49.966899634962374]
We show that calibration error and refinement error are not minimized simultaneously during training.<n>We introduce a new metric for early stopping and hyper parameter tuning that makes it possible to minimize refinement error during training.<n>Our method integrates seamlessly with any architecture and consistently improves performance across diverse classification tasks.
arXiv Detail & Related papers (2025-01-31T15:03:54Z) - Continual learning for surface defect segmentation by subnetwork
creation and selection [55.2480439325792]
We introduce a new continual (or lifelong) learning algorithm that performs segmentation tasks without undergoing catastrophic forgetting.
The method is applied to two different surface defect segmentation problems that are learned incrementally.
Our approach shows comparable results with joint training when all the training data (all defects) are seen simultaneously.
arXiv Detail & Related papers (2023-12-08T15:28:50Z) - In-Context Convergence of Transformers [63.04956160537308]
We study the learning dynamics of a one-layer transformer with softmax attention trained via gradient descent.
For data with imbalanced features, we show that the learning dynamics take a stage-wise convergence process.
arXiv Detail & Related papers (2023-10-08T17:55:33Z) - Uncovering mesa-optimization algorithms in Transformers [61.06055590704677]
Some autoregressive models can learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so.
We show that standard next-token prediction error minimization gives rise to a subsidiary learning algorithm that adjusts the model as new inputs are revealed.
Our findings explain in-context learning as a product of autoregressive loss minimization and inform the design of new optimization-based Transformer layers.
arXiv Detail & Related papers (2023-09-11T22:42:50Z) - Understanding Incremental Learning of Gradient Descent: A Fine-grained
Analysis of Matrix Sensing [74.2952487120137]
It is believed that Gradient Descent (GD) induces an implicit bias towards good generalization in machine learning models.
This paper provides a fine-grained analysis of the dynamics of GD for the matrix sensing problem.
arXiv Detail & Related papers (2023-01-27T02:30:51Z) - Online Regularized Learning Algorithm for Functional Data [2.5382095320488673]
This paper considers online regularized learning algorithm in Hilbert kernel spaces.
It shows that convergence rates of both prediction error and estimation error with constant step-size are competitive with those in the literature.
arXiv Detail & Related papers (2022-11-24T11:56:10Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Learning-to-learn non-convex piecewise-Lipschitz functions [44.6133187924678]
We analyze the meta-learning of the algorithms for piecewise-Lipschitz functions, a non-task with applications to both machine learning algorithms.
We propose a practical meta-learning procedure that learns both the step-size of the algorithm from multiple online learning tasks.
arXiv Detail & Related papers (2021-08-19T16:22:48Z) - Dictionary and prior learning with unrolled algorithms for unsupervised
inverse problems [12.54744464424354]
We study Dictionary and Prior learning from degraded measurements as a bi-level problem.
We take advantage of unrolled algorithms to solve approximate formulations of Synthesis and Analysis.
arXiv Detail & Related papers (2021-06-11T12:21:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.