Related papers: Rademacher learning rates for iterated random functions

Rademacher learning rates for iterated random functions

URL: http://arxiv.org/abs/2506.13946v1
Date: Mon, 16 Jun 2025 19:36:13 GMT
Title: Rademacher learning rates for iterated random functions
Authors: Nikola Sandrić,
Abstract summary: We consider the case where the training dataset is generated by an iterated random function that is not necessarily irreducible or aperiodic.<n>Under the assumption that the governing function is contractive with respect to its first argument, we first establish a uniform convergence result for the corresponding sample error.<n>We then demonstrate the learnability of the approximate empirical risk minimization algorithm and derive its learning rate bound.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most existing literature on supervised machine learning assumes that the training dataset is drawn from an i.i.d. sample. However, many real-world problems exhibit temporal dependence and strong correlations between the marginal distributions of the data-generating process, suggesting that the i.i.d. assumption is often unrealistic. In such cases, models naturally include time-series processes with mixing properties, as well as irreducible and aperiodic ergodic Markov chains. Moreover, the learning rates typically obtained in these settings are independent of the data distribution, which can lead to restrictive choices of hypothesis classes and suboptimal sample complexities for the learning algorithm. In this article, we consider the case where the training dataset is generated by an iterated random function (i.e., an iteratively defined time-homogeneous Markov chain) that is not necessarily irreducible or aperiodic. Under the assumption that the governing function is contractive with respect to its first argument and subject to certain regularity conditions on the hypothesis class, we first establish a uniform convergence result for the corresponding sample error. We then demonstrate the learnability of the approximate empirical risk minimization algorithm and derive its learning rate bound. Both rates are data-distribution dependent, expressed in terms of the Rademacher complexities of the underlying hypothesis class, allowing them to more accurately reflect the properties of the data-generating distribution.

Related papers

Dimension-free error estimate for diffusion model and optimal scheduling [22.20348860913421]
Diffusion generative models have emerged as powerful tools for producing synthetic data from an empirically observed distribution.<n>Previous analyses have quantified the error between the generated and the true data distributions in terms of Wasserstein distance or Kullback-Leibler divergence.<n>In this work, we derive an explicit, dimension-free bound on the discrepancy between the generated and the true data distributions.
arXiv Detail & Related papers (2025-12-01T15:58:20Z)
A Theoretical Analysis of Detecting Large Model-Generated Time Series [19.438825121010783]
Motivated by the increasing risks of data misuse and fabrication, we investigate the problem of identifying synthetic time series generated by Time-Series Large Models (TSLMs)<n>We introduce the Uncertainty Contraction Estimator (UCE), a white-box detector that aggregates uncertainty metrics over successive prefixes to identify TSLM-generated time series.
arXiv Detail & Related papers (2025-11-10T13:48:48Z)
Flow-Based Non-stationary Temporal Regime Causal Structure Learning [49.77103348208835]
We introduce FANTOM, a unified framework for causal discovery.<n>It handles non stationary processes along with non Gaussian and heteroscedastic noises.<n>It simultaneously infers the number of regimes and their corresponding indices and learns each regime's Directed Acyclic Graph.
arXiv Detail & Related papers (2025-06-20T15:12:43Z)
Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.<n>We demonstrate that in this setting, the generalized cross validation estimator (GCV) fails to correctly predict the out-of-sample risk.<n>We further extend our analysis to the case where the test point has nontrivial correlations with the training set, a setting often encountered in time series forecasting.
arXiv Detail & Related papers (2024-08-08T17:27:29Z)
Trade-off Between Dependence and Complexity for Nonparametric Learning -- an Empirical Process Approach [10.27974860479791]
In many applications where the data exhibit temporal dependencies, the corresponding empirical processes are much less understood. We present a general bound on the expected supremum of empirical processes under standard $beta/rho$-mixing assumptions. We show that even under long-range dependence, it is possible to attain the same rates as in the i.i.d. setting.
arXiv Detail & Related papers (2024-01-17T05:08:37Z)
Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy. At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z)
Probabilistic Learning of Multivariate Time Series with Temporal Irregularity [21.361823581838355]
Real-world time series often suffer from temporal irregularities, including nonuniform intervals and misaligned variables.<n>We propose an end-to-end framework that models temporal irregularities while capturing the joint distribution of variables at arbitrary continuous-time points.
arXiv Detail & Related papers (2023-06-15T14:08:48Z)
Testing for Overfitting [0.0]
We discuss the overfitting problem and explain why standard and concentration results do not hold for evaluation with training data.<n>We introduce and argue for a hypothesis test by means of which both model performance may be evaluated using training data.
arXiv Detail & Related papers (2023-05-09T22:49:55Z)
Nonparametric Conditional Local Independence Testing [69.31200003384122]
Conditional local independence is an independence relation among continuous time processes. No nonparametric test of conditional local independence has been available. We propose such a nonparametric test based on double machine learning.
arXiv Detail & Related papers (2022-03-25T10:31:02Z)
TACTiS: Transformer-Attentional Copulas for Time Series [76.71406465526454]
estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance. We propose a versatile method that estimates joint distributions using an attention-based decoder. We show that our model produces state-of-the-art predictions on several real-world datasets.
arXiv Detail & Related papers (2022-02-07T21:37:29Z)
Optimal regularizations for data generation with probabilistic graphical models [0.0]
Empirically, well-chosen regularization schemes dramatically improve the quality of the inferred models. We consider the particular case of L 2 and L 1 regularizations in the Maximum A Posteriori (MAP) inference of generative pairwise graphical models.
arXiv Detail & Related papers (2021-12-02T14:45:16Z)
Learning from non-irreducible Markov chains [0.0]
We focus on the case when the training data set is drawn from a not necessarily irreducible Markov chain. We first obtain a uniform convergence result for the corresponding sample error, and then we conclude learnability of the approximate sample error minimization algorithm.
arXiv Detail & Related papers (2021-10-08T19:00:19Z)
Contrastive learning of strong-mixing continuous-time stochastic processes [53.82893653745542]
Contrastive learning is a family of self-supervised methods where a model is trained to solve a classification task constructed from unlabeled data. We show that a properly constructed contrastive learning task can be used to estimate the transition kernel for small-to-mid-range intervals in the diffusion case.
arXiv Detail & Related papers (2021-03-03T23:06:47Z)
Understanding Double Descent Requires a Fine-Grained Bias-Variance Decomposition [34.235007566913396]
We describe an interpretable, symmetric decomposition of the variance into terms associated with the labels. We find that the bias decreases monotonically with the network width, but the variance terms exhibit non-monotonic behavior. We also analyze the strikingly rich phenomenology that arises.
arXiv Detail & Related papers (2020-11-04T21:04:02Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.