Learning from non-irreducible Markov chains
- URL: http://arxiv.org/abs/2110.04338v1
- Date: Fri, 8 Oct 2021 19:00:19 GMT
- Title: Learning from non-irreducible Markov chains
- Authors: Nikola Sandri\'c and Stjepan \v{S}ebek
- Abstract summary: We focus on the case when the training data set is drawn from a not necessarily irreducible Markov chain.
We first obtain a uniform convergence result for the corresponding sample error, and then we conclude learnability of the approximate sample error minimization algorithm.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Most of the existing literature on supervised learning problems focuses on
the case when the training data set is drawn from an i.i.d. sample. However,
many practical supervised learning problems are characterized by temporal
dependence and strong correlation between the marginals of the data-generating
process, suggesting that the i.i.d. assumption is not always justified. This
problem has been already considered in the context of Markov chains satisfying
the Doeblin condition. This condition, among other things, implies that the
chain is not singular in its behavior, i.e. it is irreducible. In this article,
we focus on the case when the training data set is drawn from a not necessarily
irreducible Markov chain. Under the assumption that the chain is uniformly
ergodic with respect to the $\mathrm{L}^1$-Wasserstein distance, and certain
regularity assumptions on the hypothesis class and the state space of the
chain, we first obtain a uniform convergence result for the corresponding
sample error, and then we conclude learnability of the approximate sample error
minimization algorithm and find its generalization bounds. At the end, a
relative uniform convergence result for the sample error is also discussed.
Related papers
- Ito Diffusion Approximation of Universal Ito Chains for Sampling, Optimization and Boosting [64.0722630873758]
We consider rather general and broad class of Markov chains, Ito chains, that look like Euler-Maryama discretization of some Differential Equation.
We prove the bound in $W_2$-distance between the laws of our Ito chain and differential equation.
arXiv Detail & Related papers (2023-10-09T18:38:56Z) - Probabilistic Learning of Multivariate Time Series with Temporal
Irregularity [25.91078012394032]
temporal irregularities, including nonuniform time intervals and component misalignment.
We develop a conditional flow representation to non-parametrically represent the data distribution, which is typically non-Gaussian.
The broad applicability and superiority of the proposed solution are confirmed by comparing it with existing approaches through ablation studies and testing on real-world datasets.
arXiv Detail & Related papers (2023-06-15T14:08:48Z) - Learning Linear Causal Representations from Interventions under General
Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets.
This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z) - A Characterization of Semi-Supervised Adversarially-Robust PAC Learnability [57.502573663108535]
We study the problem of learning an adversarially robust predictor to test time attacks in the semi-supervised PAC model.
We show that there is a significant benefit in semi-supervised robust learning even in the worst-case distribution-free model.
arXiv Detail & Related papers (2022-02-11T03:01:45Z) - Covariate Shift in High-Dimensional Random Feature Regression [44.13449065077103]
Covariate shift is a significant obstacle in the development of robust machine learning models.
We present a theoretical understanding in context of modern machine learning.
arXiv Detail & Related papers (2021-11-16T05:23:28Z) - Spectral clustering under degree heterogeneity: a case for the random
walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree.
In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z) - Contrastive learning of strong-mixing continuous-time stochastic
processes [53.82893653745542]
Contrastive learning is a family of self-supervised methods where a model is trained to solve a classification task constructed from unlabeled data.
We show that a properly constructed contrastive learning task can be used to estimate the transition kernel for small-to-mid-range intervals in the diffusion case.
arXiv Detail & Related papers (2021-03-03T23:06:47Z) - Robust Unsupervised Learning via L-Statistic Minimization [38.49191945141759]
We present a general approach to this problem focusing on unsupervised learning.
The key assumption is that the perturbing distribution is characterized by larger losses relative to a given class of admissible models.
We prove uniform convergence bounds with respect to the proposed criterion for several popular models in unsupervised learning.
arXiv Detail & Related papers (2020-12-14T10:36:06Z) - Contrastive Divergence Learning is a Time Reversal Adversarial Game [32.46369991490501]
Contrastive divergence (CD) learning is a classical method for fitting unnormalized statistical models to data samples.
We show that CD is an adversarial learning procedure, where a discriminator attempts to classify whether a Markov chain generated from the model has been time-reversed.
Our derivation settles well with previous observations, which have concluded that CD's update steps cannot be expressed as the gradients of any fixed objective function.
arXiv Detail & Related papers (2020-12-06T15:54:05Z) - Estimating means of bounded random variables by betting [39.98103047898974]
This paper derives confidence intervals (CI) and time-uniform confidence sequences (CS) for the classical problem of estimating an unknown mean from bounded observations.
We present a general approach for deriving concentration bounds, that can be seen as a generalization and improvement of the celebrated Chernoff method.
We show how to extend these ideas to sampling without replacement, another heavily studied problem.
arXiv Detail & Related papers (2020-10-19T17:22:03Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.