Learning with little mixing
- URL: http://arxiv.org/abs/2206.08269v3
- Date: Thu, 13 Jun 2024 16:34:45 GMT
- Title: Learning with little mixing
- Authors: Ingvar Ziemann, Stephen Tu,
- Abstract summary: We study square loss in a realizable time-series framework with martingale difference noise.
Our main result is a fast rate excess risk bound which shows that whenever a trajectory hypercontractivity condition holds, the risk of the least-squares estimator on dependent data matches the iid rate order-wise after a burn-in time.
- Score: 9.574025613149871
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study square loss in a realizable time-series framework with martingale difference noise. Our main result is a fast rate excess risk bound which shows that whenever a trajectory hypercontractivity condition holds, the risk of the least-squares estimator on dependent data matches the iid rate order-wise after a burn-in time. In comparison, many existing results in learning from dependent data have rates where the effective sample size is deflated by a factor of the mixing-time of the underlying process, even after the burn-in time. Furthermore, our results allow the covariate process to exhibit long range correlations which are substantially weaker than geometric ergodicity. We call this phenomenon learning with little mixing, and present several examples for when it occurs: bounded function classes for which the $L^2$ and $L^{2+\epsilon}$ norms are equivalent, ergodic finite state Markov chains, various parametric models, and a broad family of infinite dimensional $\ell^2(\mathbb{N})$ ellipsoids. By instantiating our main result to system identification of nonlinear dynamics with generalized linear model transitions, we obtain a nearly minimax optimal excess risk bound after only a polynomial burn-in time.
Related papers
- MultiPDENet: PDE-embedded Learning with Multi-time-stepping for Accelerated Flow Simulation [48.41289705783405]
We propose a PDE-embedded network with multiscale time stepping (MultiPDENet)
In particular, we design a convolutional filter based on the structure of finite difference with a small number of parameters to optimize.
A Physics Block with a 4th-order Runge-Kutta integrator at the fine time scale is established that embeds the structure of PDEs to guide the prediction.
arXiv Detail & Related papers (2025-01-27T12:15:51Z) - Momentum-based minimization of the Ginzburg-Landau functional on Euclidean spaces and graphs [0.0]
We study the momentum-based minimization of a diffuse perimeter functional on Euclidean spaces and on graphs with applications to semi-supervised classification tasks in machine learning.
We demonstrate empirically that momentum can lead to faster convergence if the time step size is large but not too large.
arXiv Detail & Related papers (2024-12-31T11:05:49Z) - Extreme Value Modelling of Feature Residuals for Anomaly Detection in Dynamic Graphs [14.8066991252587]
detecting anomalies in a temporal sequence of graphs can be applied to areas such as the detection of accidents in transport networks and cyber attacks in computer networks.
Existing methods for detecting abnormal graphs can suffer from multiple limitations, such as high false positive rates and difficulties with handling variable-sized graphs and non-trivial temporal dynamics.
We propose a technique where temporal dependencies are explicitly modelled via time series analysis of a large set of pertinent graph features, followed by using residuals to remove the dependencies.
arXiv Detail & Related papers (2024-10-08T05:00:53Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.
We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting.
We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Precise analysis of ridge interpolators under heavy correlations -- a Random Duality Theory view [0.0]
We show that emphRandom Duality Theory (RDT) can be utilized to obtain precise closed form characterizations of all estimators related optimizing quantities of interest.
arXiv Detail & Related papers (2024-06-13T14:56:52Z) - Graph Spatiotemporal Process for Multivariate Time Series Anomaly
Detection with Missing Values [67.76168547245237]
We introduce a novel framework called GST-Pro, which utilizes a graphtemporal process and anomaly scorer to detect anomalies.
Our experimental results show that the GST-Pro method can effectively detect anomalies in time series data and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-01-11T10:10:16Z) - Polynomial-Time Solutions for ReLU Network Training: A Complexity
Classification via Max-Cut and Zonotopes [70.52097560486683]
We prove that the hardness of approximation of ReLU networks not only mirrors the complexity of the Max-Cut problem but also, in certain special cases, exactly corresponds to it.
In particular, when $epsilonleqsqrt84/83-1approx 0.006$, we show that it is NP-hard to find an approximate global dataset of the ReLU network objective with relative error $epsilon$ with respect to the objective value.
arXiv Detail & Related papers (2023-11-18T04:41:07Z) - A U-turn on Double Descent: Rethinking Parameter Counting in Statistical
Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n.
This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z) - Benign Overfitting in Time Series Linear Model with
Over-Parameterization [5.68558935178946]
We develop a theory for excess risk of the estimator under multiple dependence types.
We show that the convergence rate of risks with short-memory processes is identical to that of cases with independent data.
arXiv Detail & Related papers (2022-04-18T15:26:58Z) - Sampling Approximately Low-Rank Ising Models: MCMC meets Variational
Methods [35.24886589614034]
We consider quadratic definite Ising models on the hypercube with a general interaction $J$.
Our general result implies the first time sampling algorithms for low-rank Ising models.
arXiv Detail & Related papers (2022-02-17T21:43:50Z) - Bayesian Inference in High-Dimensional Time-Serieswith the Orthogonal
Stochastic Linear Mixing Model [2.7909426811685893]
Many modern time-series datasets contain large numbers of output response variables sampled for prolonged periods of time.
In this paper, we propose a new Markov chain Monte Carlo framework for the analysis of diverse, large-scale time-series datasets.
arXiv Detail & Related papers (2021-06-25T01:12:54Z) - Consistency of mechanistic causal discovery in continuous-time using
Neural ODEs [85.7910042199734]
We consider causal discovery in continuous-time for the study of dynamical systems.
We propose a causal discovery algorithm based on penalized Neural ODEs.
arXiv Detail & Related papers (2021-05-06T08:48:02Z) - Interpolation and Learning with Scale Dependent Kernels [91.41836461193488]
We study the learning properties of nonparametric ridge-less least squares.
We consider the common case of estimators defined by scale dependent kernels.
arXiv Detail & Related papers (2020-06-17T16:43:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.