On the Performance of Empirical Risk Minimization with Smoothed Data
- URL: http://arxiv.org/abs/2402.14987v1
- Date: Thu, 22 Feb 2024 21:55:41 GMT
- Title: On the Performance of Empirical Risk Minimization with Smoothed Data
- Authors: Adam Block, Alexander Rakhlin, and Abhishek Shetty
- Abstract summary: Empirical Risk Minimization (ERM) is able to achieve sublinear error whenever a class is learnable with iid data.
We show that ERM is able to achieve sublinear error whenever a class is learnable with iid data.
- Score: 59.3428024282545
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In order to circumvent statistical and computational hardness results in
sequential decision-making, recent work has considered smoothed online
learning, where the distribution of data at each time is assumed to have
bounded likeliehood ratio with respect to a base measure when conditioned on
the history. While previous works have demonstrated the benefits of smoothness,
they have either assumed that the base measure is known to the learner or have
presented computationally inefficient algorithms applying only in special
cases. This work investigates the more general setting where the base measure
is \emph{unknown} to the learner, focusing in particular on the performance of
Empirical Risk Minimization (ERM) with square loss when the data are
well-specified and smooth. We show that in this setting, ERM is able to achieve
sublinear error whenever a class is learnable with iid data; in particular, ERM
achieves error scaling as $\tilde O( \sqrt{\mathrm{comp}(\mathcal F)\cdot T}
)$, where $\mathrm{comp}(\mathcal F)$ is the statistical complexity of learning
$\mathcal F$ with iid data. In so doing, we prove a novel norm comparison bound
for smoothed data that comprises the first sharp norm comparison for dependent
data applying to arbitrary, nonlinear function classes. We complement these
results with a lower bound indicating that our analysis of ERM is essentially
tight, establishing a separation in the performance of ERM between smoothed and
iid data.
Related papers
- Performance of Cross-Validated Targeted Maximum Likelihood Estimation [0.0]
We investigate the performance of CVTMLE compared to TMLE in various settings.
CVTMLE vastly improves confidence interval coverage without adversely affecting bias.
We show through simulations that CVTMLE is much less sensitive to the choice of the super learner library.
arXiv Detail & Related papers (2024-09-17T15:15:03Z) - Asymptotic Characterisation of Robust Empirical Risk Minimisation
Performance in the Presence of Outliers [18.455890316339595]
We study robust linear regression in high-dimension, when both the dimension $d$ and the number of data points $n$ diverge with a fixed ratio $alpha=n/d$, and study a data model that includes outliers.
We provide exacts for the performances of the empirical risk minimisation (ERM) using $ell$-regularised $ell$, $ell_$, and Huber losses.
arXiv Detail & Related papers (2023-05-30T12:18:39Z) - On the Generalization for Transfer Learning: An Information-Theoretic Analysis [8.102199960821165]
We give an information-theoretic analysis of the generalization error and excess risk of transfer learning algorithms.
Our results suggest, perhaps as expected, that the Kullback-Leibler divergenceD(mu|mu')$ plays an important role in the characterizations.
We then generalize the mutual information bound with other divergences such as $phi$-divergence and Wasserstein distance.
arXiv Detail & Related papers (2022-07-12T08:20:41Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - RIFLE: Imputation and Robust Inference from Low Order Marginals [10.082738539201804]
We develop a statistical inference framework for regression and classification in the presence of missing data without imputation.
Our framework, RIFLE, estimates low-order moments of the underlying data distribution with corresponding confidence intervals to learn a distributionally robust model.
Our experiments demonstrate that RIFLE outperforms other benchmark algorithms when the percentage of missing values is high and/or when the number of data points is relatively small.
arXiv Detail & Related papers (2021-09-01T23:17:30Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.