Super fast rates in structured prediction
- URL: http://arxiv.org/abs/2102.00760v1
- Date: Mon, 1 Feb 2021 10:50:04 GMT
- Title: Super fast rates in structured prediction
- Authors: Vivien Cabannes and Alessandro Rudi and Francis Bach
- Abstract summary: We show how we can leverage the fact that discrete problems are essentially predicting a discrete output when continuous problems are predicting a continuous value.
We first illustrate it for predictors based on nearest neighbors, generalizing rates known for binary classification to any discrete problem within the framework of structured prediction.
We then consider kernel ridge regression where we improve known rates in $n-1/4$ to arbitrarily fast rates, depending on a parameter characterizing the hardness of the problem.
- Score: 88.99819200562784
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Discrete supervised learning problems such as classification are often
tackled by introducing a continuous surrogate problem akin to regression.
Bounding the original error, between estimate and solution, by the surrogate
error endows discrete problems with convergence rates already shown for
continuous instances. Yet, current approaches do not leverage the fact that
discrete problems are essentially predicting a discrete output when continuous
problems are predicting a continuous value. In this paper, we tackle this issue
for general structured prediction problems, opening the way to "super fast"
rates, that is, convergence rates for the excess risk faster than $n^{-1}$,
where $n$ is the number of observations, with even exponential rates with the
strongest assumptions. We first illustrate it for predictors based on nearest
neighbors, generalizing rates known for binary classification to any discrete
problem within the framework of structured prediction. We then consider kernel
ridge regression where we improve known rates in $n^{-1/4}$ to arbitrarily fast
rates, depending on a parameter characterizing the hardness of the problem,
thus allowing, under smoothness assumptions, to bypass the curse of
dimensionality.
Related papers
- Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation [51.248784084461334]
We prove new convergence rates for a generalized version of Nesterov acceleration underrho conditions.
Our analysis reduces the dependence on the strong growth constant from $$ to $sqrt$ as compared to prior work.
arXiv Detail & Related papers (2024-04-03T00:41:19Z) - Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density.
We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order.
We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z) - Distributionally Robust Optimization with Bias and Variance Reduction [9.341215359733601]
We show that Prospect, a gradient-based algorithm, enjoys linear convergence for smooth regularized losses.
We also show that Prospect can converge 2-3$times$ faster than baselines such as gradient-based methods.
arXiv Detail & Related papers (2023-10-21T00:03:54Z) - Empirical Risk Minimization with Shuffled SGD: A Primal-Dual Perspective
and Improved Bounds [12.699376765058137]
gradient descent (SGD) is perhaps the most prevalent optimization method in modern machine learning.
It is only very recently that SGD with sampling without replacement -- shuffled SGD -- has been analyzed.
We prove fine-grained complexity bounds that depend on the data matrix and are never worse than what is predicted by the existing bounds.
arXiv Detail & Related papers (2023-06-21T18:14:44Z) - On the tightness of information-theoretic bounds on generalization error
of learning algorithms [5.081241420920605]
We first show that the square root does not necessarily imply a slow rate, and a fast rate result can still be obtained using this bound.
We identify the critical conditions needed for the fast rate generalization error, which we call the $(eta,c)$-central condition.
arXiv Detail & Related papers (2023-03-26T08:59:05Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Benign Underfitting of Stochastic Gradient Descent [72.38051710389732]
We study to what extent may gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit training data.
We analyze the closely related with-replacement SGD, for which an analogous phenomenon does not occur and prove that its population risk does in fact converge at the optimal rate.
arXiv Detail & Related papers (2022-02-27T13:25:01Z) - Robust Linear Predictions: Analyses of Uniform Concentration, Fast Rates
and Model Misspecification [16.0817847880416]
We offer a unified framework that includes a broad variety of linear prediction problems on a Hilbert space.
We show that for misspecification level $epsilon$, these estimators achieve an error rate of $O(maxleft|mathcalO|1/2n-1/2, |mathcalI|1/2n-1 right+epsilon)$, matching the best-known rates in literature.
arXiv Detail & Related papers (2022-01-06T08:51:08Z) - Fast Rates for the Regret of Offline Reinforcement Learning [69.23654172273085]
We study the regret of reinforcement learning from offline data generated by a fixed behavior policy in an infinitehorizon discounted decision process (MDP)
We show that given any estimate for the optimal quality function $Q*$, the regret of the policy it defines converges at a rate given by the exponentiation of the $Q*$-estimate's pointwise convergence rate.
arXiv Detail & Related papers (2021-01-31T16:17:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.