Dimension-free Regret for Learning Asymmetric Linear Dynamical Systems
- URL: http://arxiv.org/abs/2502.06545v1
- Date: Mon, 10 Feb 2025 15:10:06 GMT
- Title: Dimension-free Regret for Learning Asymmetric Linear Dynamical Systems
- Authors: Annie Marsden, Elad Hazan,
- Abstract summary: We introduce a novel method that overcomes the trade-off, dimension-free regret despite the presence of matrices.<n>Our method combines spectral filtering with linear predictors and employs Chebyshevs in the complex plane to construct a novel spectral filtering basis.<n>We prove that as long as the transition matrix has eigenvalues with complex component bounded by $1/mathrmpoly log$, then our method achieves regret $tildeO(9/10)$ when compared to the best linear predictor in hindsight.
- Score: 19.415741153449265
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Previously, methods for learning marginally stable linear dynamical systems either required the transition matrix to be symmetric or incurred regret bounds that scale polynomially with the system's hidden dimension. In this work, we introduce a novel method that overcomes this trade-off, achieving dimension-free regret despite the presence of asymmetric matrices and marginal stability. Our method combines spectral filtering with linear predictors and employs Chebyshev polynomials in the complex plane to construct a novel spectral filtering basis. This construction guarantees sublinear regret in an online learning framework, without relying on any statistical or generative assumptions. Specifically, we prove that as long as the transition matrix has eigenvalues with complex component bounded by $1/\mathrm{poly} \log T$, then our method achieves regret $\tilde{O}(T^{9/10})$ when compared to the best linear dynamical predictor in hindsight.
Related papers
- Fundamental Limits of Matrix Sensing: Exact Asymptotics, Universality, and Applications [30.659400341011004]
We provide rigorous equations characterizing the Bayes-optimal learning performance from a number of samples.<n>We mathematically establish predictions obtained via non-rigorous methods from statistical physics.
arXiv Detail & Related papers (2025-03-18T10:36:30Z) - The Sample Complexity of Online Reinforcement Learning: A Multi-model Perspective [55.15192437680943]
We study the sample complexity of online reinforcement learning for nonlinear dynamical systems with continuous state and action spaces.<n>Our algorithms are likely to be useful in practice, due to their simplicity, the ability to incorporate prior knowledge, and their benign transient behavior.
arXiv Detail & Related papers (2025-01-27T10:01:28Z) - GP-FL: Model-Based Hessian Estimation for Second-Order Over-the-Air Federated Learning [52.295563400314094]
Second-order methods are widely adopted to improve the convergence rate of learning algorithms.<n>This paper introduces a novel second-order FL framework tailored for wireless channels.
arXiv Detail & Related papers (2024-12-05T04:27:41Z) - Learning Linear Attention in Polynomial Time [115.68795790532289]
We provide the first results on learnability of single-layer Transformers with linear attention.
We show that linear attention may be viewed as a linear predictor in a suitably defined RKHS.
We show how to efficiently identify training datasets for which every empirical riskr is equivalent to the linear Transformer.
arXiv Detail & Related papers (2024-10-14T02:41:01Z) - Unsupervised Representation Learning from Sparse Transformation Analysis [79.94858534887801]
We propose to learn representations from sequence data by factorizing the transformations of the latent variables into sparse components.
Input data are first encoded as distributions of latent activations and subsequently transformed using a probability flow model.
arXiv Detail & Related papers (2024-10-07T23:53:25Z) - In-Context Convergence of Transformers [63.04956160537308]
We study the learning dynamics of a one-layer transformer with softmax attention trained via gradient descent.
For data with imbalanced features, we show that the learning dynamics take a stage-wise convergence process.
arXiv Detail & Related papers (2023-10-08T17:55:33Z) - Asymmetric matrix sensing by gradient descent with small random
initialization [0.8611782340880084]
We study the problem of reconstructing a low-rank matrix from a few linear measurements.
Our key contribution is introducing a continuous gradient flow equation that we call the $texted gradient flow$.
arXiv Detail & Related papers (2023-09-04T20:23:35Z) - The Decimation Scheme for Symmetric Matrix Factorization [0.0]
Matrix factorization is an inference problem that has acquired importance due to its vast range of applications.
We study this extensive rank problem, extending the alternative 'decimation' procedure that we recently introduced.
We introduce a simple algorithm based on a ground state search that implements decimation and performs matrix factorization.
arXiv Detail & Related papers (2023-07-31T10:53:45Z) - Neural incomplete factorization: learning preconditioners for the conjugate gradient method [2.899792823251184]
We develop a data-driven approach to accelerate the generation of effective preconditioners.
We replace the typically hand-engineered preconditioners by the output of graph neural networks.
Our method generates an incomplete factorization of the matrix and is, therefore, referred to as neural incomplete factorization (NeuralIF)
arXiv Detail & Related papers (2023-05-25T11:45:46Z) - Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding [21.38845517949153]
This paper presents an approach, Spectral Dynamics Embedding Control (SDEC), to optimal control for nonlinear systems.
We use an infinite-dimensional feature to linearly represent the state-action value function and exploits finite-dimensional truncation approximation for practical implementation.
arXiv Detail & Related papers (2023-04-08T04:23:46Z) - Implicit Balancing and Regularization: Generalization and Convergence
Guarantees for Overparameterized Asymmetric Matrix Sensing [28.77440901439686]
A series of recent papers have begun to generalize this role for non-random Positive Semi-Defin (PSD) matrix sensing problems.
In this paper, we show that the trajectory of the gradient descent from small random measurements moves towards solutions that are both globally well.
arXiv Detail & Related papers (2023-03-24T19:05:52Z) - Oracle-Preserving Latent Flows [58.720142291102135]
We develop a methodology for the simultaneous discovery of multiple nontrivial continuous symmetries across an entire labelled dataset.
The symmetry transformations and the corresponding generators are modeled with fully connected neural networks trained with a specially constructed loss function.
The two new elements in this work are the use of a reduced-dimensionality latent space and the generalization to transformations invariant with respect to high-dimensional oracles.
arXiv Detail & Related papers (2023-02-02T00:13:32Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Manifold learning-based polynomial chaos expansions for high-dimensional
surrogate models [0.0]
We introduce a manifold learning-based method for uncertainty quantification (UQ) in describing systems.
The proposed method is able to achieve highly accurate approximations which ultimately lead to the significant acceleration of UQ tasks.
arXiv Detail & Related papers (2021-07-21T00:24:15Z) - Explainable nonlinear modelling of multiple time series with invertible
neural networks [7.605814048051735]
A method for nonlinear topology identification is proposed, based on the assumption that a collection of time series are generated in two steps.
The latter mappings are assumed invertible, and are modelled as shallow neural networks, so that their inverse can be numerically evaluated.
This paper explains the steps needed to calculate the gradients applying implicit differentiation.
arXiv Detail & Related papers (2021-07-01T12:07:09Z) - Online Stochastic Gradient Descent Learns Linear Dynamical Systems from
A Single Trajectory [1.52292571922932]
We show that if the unknown weight matrices describing the system are in Brunovsky canonical form, we can efficiently estimate the ground truth unknown of the system.
Specifically, by deriving concrete bounds, we show that SGD converges linearly in expectation to any arbitrary small Frobenius norm distance from the ground truth weights.
arXiv Detail & Related papers (2021-02-23T17:48:39Z) - Beyond Procrustes: Balancing-Free Gradient Descent for Asymmetric
Low-Rank Matrix Sensing [36.96922859748537]
Low-rank matrix estimation plays a central role in various applications across science and engineering.
Existing approaches rely on adding a metric regularization term to balance the scale of the two matrix factors.
In this paper, we provide a theoretical justification for the performance in recovering a low-rank matrix from a small number of linear measurements.
arXiv Detail & Related papers (2021-01-13T15:03:52Z) - Sparse Quantized Spectral Clustering [85.77233010209368]
We exploit tools from random matrix theory to make precise statements about how the eigenspectrum of a matrix changes under such nonlinear transformations.
We show that very little change occurs in the informative eigenstructure even under drastic sparsification/quantization.
arXiv Detail & Related papers (2020-10-03T15:58:07Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Eigendecomposition-Free Training of Deep Networks for Linear
Least-Square Problems [107.3868459697569]
We introduce an eigendecomposition-free approach to training a deep network.
We show that our approach is much more robust than explicit differentiation of the eigendecomposition.
Our method has better convergence properties and yields state-of-the-art results.
arXiv Detail & Related papers (2020-04-15T04:29:34Z) - No-Regret Prediction in Marginally Stable Systems [37.178095559618654]
We consider the problem of online prediction in a marginally stable linear dynamical system.
By applying our techniques to learning an autoregressive filter, we also achieve logarithmic regret in the partially observed setting.
arXiv Detail & Related papers (2020-02-06T01:53:34Z) - Regret Minimization in Partially Observable Linear Quadratic Control [91.43582419264763]
We study the problem of regret in partially observable linear quadratic control systems when the model dynamics are unknown a priori.
We propose a novel way to decompose the regret and provide an end-to-end sublinear regret upper bound for partially observable linear quadratic control.
arXiv Detail & Related papers (2020-01-31T22:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.