Learning from many trajectories
- URL: http://arxiv.org/abs/2203.17193v1
- Date: Thu, 31 Mar 2022 17:17:08 GMT
- Title: Learning from many trajectories
- Authors: Stephen Tu and Roy Frostig and Mahdi Soltanolkotabi
- Abstract summary: We study supervised learning from many independent sequences of non-independent co variables.
Our setup sits between learning from independent examples and learning from a single auto-correlated sequence.
A key upshot is that, in domains where trajectories regularly reset, the error rate eventually behaves as if all of the examples were independent altogether.
- Score: 37.265419499679474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We initiate a study of supervised learning from many independent sequences
("trajectories") of non-independent covariates, reflecting tasks in sequence
modeling, control, and reinforcement learning. Conceptually, our
multi-trajectory setup sits between two traditional settings in statistical
learning theory: learning from independent examples and learning from a single
auto-correlated sequence. Our conditions for efficient learning generalize the
former setting--trajectories must be non-degenerate in ways that extend
standard requirements for independent examples. They do not require that
trajectories be ergodic, long, nor strictly stable.
For linear least-squares regression, given $n$-dimensional examples produced
by $m$ trajectories, each of length $T$, we observe a notable change in
statistical efficiency as the number of trajectories increases from a few
(namely $m \lesssim n$) to many (namely $m \gtrsim n$). Specifically, we
establish that the worst-case error rate this problem is $\Theta(n / m T)$
whenever $m \gtrsim n$. Meanwhile, when $m \lesssim n$, we establish a (sharp)
lower bound of $\Omega(n^2 / m^2 T)$ on the worst-case error rate, realized by
a simple, marginally unstable linear dynamical system. A key upshot is that, in
domains where trajectories regularly reset, the error rate eventually behaves
as if all of the examples were independent altogether, drawn from their
marginals. As a corollary of our analysis, we also improve guarantees for the
linear system identification problem.
Related papers
- Joint Learning of Linear Dynamical Systems under Smoothness Constraints [5.2395896768723045]
We consider the problem of joint learning of multiple linear dynamical systems.
In particular, we show conditions under which the mean-squared error bounds on the mean-squared error (MSE) converges to zero as $m$ increases.
arXiv Detail & Related papers (2024-06-03T08:29:42Z) - From Spectral Theorem to Statistical Independence with Application to
System Identification [11.98319841778396]
We provide first quantitative handle on decay rate of finite powers of state transition matrix $|Ak|$.
It is shown that when a stable dynamical system has only one distinct eigenvalue and discrepancy of $n-1$: $|A|$ has a dependence on $n$, resulting dynamics are inseparable.
We show that element-wise error is essentially a variant of well-know Littlewood-Offord problem.
arXiv Detail & Related papers (2023-10-16T15:40:43Z) - Characterizing Datapoints via Second-Split Forgetting [93.99363547536392]
We propose $$-second-$split$ $forgetting$ $time$ (SSFT), a complementary metric that tracks the epoch (if any) after which an original training example is forgotten.
We demonstrate that $mislabeled$ examples are forgotten quickly, and seemingly $rare$ examples are forgotten comparatively slowly.
SSFT can (i) help to identify mislabeled samples, the removal of which improves generalization; and (ii) provide insights about failure modes.
arXiv Detail & Related papers (2022-10-26T21:03:46Z) - Learning the Dynamics of Autonomous Linear Systems From Multiple
Trajectories [2.2268031040603447]
Existing results on learning rate and consistency of autonomous linear system identification rely on observations of steady state behaviors from a single long trajectory.
We consider the scenario of learning system dynamics based on multiple short trajectories, where there are no easily observed steady state behaviors.
We show that one can adjust the length of the trajectories to achieve a learning rate of $mathcalO(sqrtfraclogNN)$ for strictly stable systems and a learning rate of $mathcalO(frac(logN)dsqr
arXiv Detail & Related papers (2022-03-24T01:29:53Z) - A Statistical Learning View of Simple Kriging [0.0]
We analyze the simple Kriging task from a statistical learning perspective.
The goal is to predict the unknown values it takes at any other location with minimum quadratic risk.
We prove non-asymptotic bounds of order $O_mathbbP (1/sqrtn)$ for the excess risk of a plug-in predictive rule mimicking the true minimizer.
arXiv Detail & Related papers (2022-02-15T12:46:43Z) - Understanding the Under-Coverage Bias in Uncertainty Estimation [58.03725169462616]
quantile regression tends to emphunder-cover than the desired coverage level in reality.
We prove that quantile regression suffers from an inherent under-coverage bias.
Our theory reveals that this under-coverage bias stems from a certain high-dimensional parameter estimation error.
arXiv Detail & Related papers (2021-06-10T06:11:55Z) - Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples.
We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z) - Learning nonlinear dynamical systems from a single trajectory [102.60042167341956]
We introduce algorithms for learning nonlinear dynamical systems of the form $x_t+1=sigma(Thetastarx_t)+varepsilon_t$.
We give an algorithm that recovers the weight matrix $Thetastar$ from a single trajectory with optimal sample complexity and linear running time.
arXiv Detail & Related papers (2020-04-30T10:42:48Z) - Non-asymptotic and Accurate Learning of Nonlinear Dynamical Systems [34.394552166070746]
We study gradient based algorithms to learn the system dynamics $theta$ from samples obtained from a single finite trajectory.
Unlike existing work, our bounds are noise sensitive which allows for learning ground-truth dynamics with high accuracy and small sample complexity.
arXiv Detail & Related papers (2020-02-20T02:36:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.