Trade-off Between Dependence and Complexity for Nonparametric Learning
-- an Empirical Process Approach
- URL: http://arxiv.org/abs/2401.08978v1
- Date: Wed, 17 Jan 2024 05:08:37 GMT
- Title: Trade-off Between Dependence and Complexity for Nonparametric Learning
-- an Empirical Process Approach
- Authors: Nabarun Deb and Debarghya Mukherjee
- Abstract summary: In many applications where the data exhibit temporal dependencies, the corresponding empirical processes are much less understood.
We present a general bound on the expected supremum of empirical processes under standard $beta/rho$-mixing assumptions.
We show that even under long-range dependence, it is possible to attain the same rates as in the i.i.d. setting.
- Score: 10.27974860479791
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Empirical process theory for i.i.d. observations has emerged as a ubiquitous
tool for understanding the generalization properties of various statistical
problems. However, in many applications where the data exhibit temporal
dependencies (e.g., in finance, medical imaging, weather forecasting etc.), the
corresponding empirical processes are much less understood. Motivated by this
observation, we present a general bound on the expected supremum of empirical
processes under standard $\beta/\rho$-mixing assumptions. Unlike most prior
work, our results cover both the long and the short-range regimes of
dependence. Our main result shows that a non-trivial trade-off between the
complexity of the underlying function class and the dependence among the
observations characterizes the learning rate in a large class of nonparametric
problems. This trade-off reveals a new phenomenon, namely that even under
long-range dependence, it is possible to attain the same rates as in the i.i.d.
setting, provided the underlying function class is complex enough. We
demonstrate the practical implications of our findings by analyzing various
statistical estimators in both fixed and growing dimensions. Our main examples
include a comprehensive case study of generalization error bounds in
nonparametric regression over smoothness classes in fixed as well as growing
dimension using neural nets, shape-restricted multivariate convex regression,
estimating the optimal transport (Wasserstein) distance between two probability
distributions, and classification under the Mammen-Tsybakov margin condition --
all under appropriate mixing assumptions. In the process, we also develop
bounds on $L_r$ ($1\le r\le 2$)-localized empirical processes with dependent
observations, which we then leverage to get faster rates for (a) tuning-free
adaptation, and (b) set-structured learning problems.
Related papers
- A U-turn on Double Descent: Rethinking Parameter Counting in Statistical
Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n.
This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z) - Mutual Information Estimation via $f$-Divergence and Data Derangements [6.43826005042477]
We propose a novel class of discrimi mutual information estimators based on the variational representation of the $f$-divergence.
The proposed estimator is flexible since it exhibits an excellent bias/ variance trade-off.
arXiv Detail & Related papers (2023-05-31T16:54:25Z) - Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics
for Convex Losses in High-Dimension [25.711297863946193]
We develop a theory for the study of fluctuations in an ensemble of generalised linear models trained on different, but correlated, features.
We provide a complete description of the joint distribution of the empirical risk minimiser for generic convex loss and regularisation in the high-dimensional limit.
arXiv Detail & Related papers (2022-01-31T17:44:58Z) - A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms.
We develop a general framework from the perspective of Bregman minimization divergence.
We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z) - Optimal regularizations for data generation with probabilistic graphical
models [0.0]
Empirically, well-chosen regularization schemes dramatically improve the quality of the inferred models.
We consider the particular case of L 2 and L 1 regularizations in the Maximum A Posteriori (MAP) inference of generative pairwise graphical models.
arXiv Detail & Related papers (2021-12-02T14:45:16Z) - Neural Estimation of Statistical Divergences [24.78742908726579]
A modern method for estimating statistical divergences relies on parametrizing an empirical variational form by a neural network (NN)
In particular, there is a fundamental tradeoff between the two sources of error involved: approximation and empirical estimation.
We show that neural estimators with a slightly different NN growth-rate are near minimax rate-optimal, achieving the parametric convergence rate up to logarithmic factors.
arXiv Detail & Related papers (2021-10-07T17:42:44Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Binary Classification of Gaussian Mixtures: Abundance of Support
Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model.
We derive novel non-asymptotic bounds on the classification error of the latter.
Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - On Disentangled Representations Learned From Correlated Data [59.41587388303554]
We bridge the gap to real-world scenarios by analyzing the behavior of the most prominent disentanglement approaches on correlated data.
We show that systematically induced correlations in the dataset are being learned and reflected in the latent representations.
We also demonstrate how to resolve these latent correlations, either using weak supervision during training or by post-hoc correcting a pre-trained model with a small number of labels.
arXiv Detail & Related papers (2020-06-14T12:47:34Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.