Towards Optimal Problem Dependent Generalization Error Bounds in
Statistical Learning Theory
- URL: http://arxiv.org/abs/2011.06186v4
- Date: Thu, 24 Dec 2020 03:10:54 GMT
- Title: Towards Optimal Problem Dependent Generalization Error Bounds in
Statistical Learning Theory
- Authors: Yunbei Xu, Assaf Zeevi
- Abstract summary: We study problem-dependent rates that scale near-optimally with the variance, the effective loss errors, or the norms evaluated at the "best gradient hypothesis"
We introduce a principled framework dubbed "uniform localized convergence"
We show that our framework resolves several fundamental limitations of existing uniform convergence and localization analysis approaches.
- Score: 11.840747467007963
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study problem-dependent rates, i.e., generalization errors that scale
near-optimally with the variance, the effective loss, or the gradient norms
evaluated at the "best hypothesis." We introduce a principled framework dubbed
"uniform localized convergence," and characterize sharp problem-dependent rates
for central statistical learning problems. From a methodological viewpoint, our
framework resolves several fundamental limitations of existing uniform
convergence and localization analysis approaches. It also provides improvements
and some level of unification in the study of localized complexities, one-sided
uniform inequalities, and sample-based iterative algorithms. In the so-called
"slow rate" regime, we provides the first (moment-penalized) estimator that
achieves the optimal variance-dependent rate for general "rich" classes; we
also establish improved loss-dependent rate for standard empirical risk
minimization. In the "fast rate" regime, we establish finite-sample
problem-dependent bounds that are comparable to precise asymptotics. In
addition, we show that iterative algorithms like gradient descent and
first-order Expectation-Maximization can achieve optimal generalization error
in several representative problems across the areas of non-convex learning,
stochastic optimization, and learning with missing data.
Related papers
- Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - Stability and Generalization of the Decentralized Stochastic Gradient
Descent Ascent Algorithm [80.94861441583275]
We investigate the complexity of the generalization bound of the decentralized gradient descent (D-SGDA) algorithm.
Our results analyze the impact of different top factors on the generalization of D-SGDA.
We also balance it with the generalization to obtain the optimal convex-concave setting.
arXiv Detail & Related papers (2023-10-31T11:27:01Z) - Communication-Efficient Gradient Descent-Accent Methods for Distributed Variational Inequalities: Unified Analysis and Local Updates [28.700663352789395]
We provide a unified convergence analysis of communication-efficient local training methods for distributed variational inequality problems (VIPs)
Our approach is based on a general key assumption on the estimates that allows us to propose and analyze several novel local training algorithms.
We present the first local descent-accent algorithms with provable improved communication complexity for solving distributed variational inequalities on heterogeneous data.
arXiv Detail & Related papers (2023-06-08T10:58:46Z) - Best Subset Selection in Reduced Rank Regression [1.4699455652461724]
We show that our algorithm can achieve the reduced rank estimation with a significant probability.
The numerical studies and an application in the cancer studies demonstrate effectiveness and scalability.
arXiv Detail & Related papers (2022-11-29T02:51:15Z) - Exploring the Algorithm-Dependent Generalization of AUPRC Optimization
with List Stability [107.65337427333064]
optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning.
In this work, we present the first trial in the single-dependent generalization of AUPRC optimization.
Experiments on three image retrieval datasets on speak to the effectiveness and soundness of our framework.
arXiv Detail & Related papers (2022-09-27T09:06:37Z) - Stability and Generalization of Stochastic Optimization with Nonconvex
and Nonsmooth Problems [34.68590236021379]
This paper introduces systematic algorithmic stability measures and the gap between the quantitative gradients and the population.
We show how we apply these algorithms to achieve an implicit regular iteration step sizes and the adaptive gradient descent.
arXiv Detail & Related papers (2022-06-14T18:14:30Z) - First-Order Algorithms for Nonlinear Generalized Nash Equilibrium
Problems [88.58409977434269]
We consider the problem of computing an equilibrium in a class of nonlinear generalized Nash equilibrium problems (NGNEPs)
Our contribution is to provide two simple first-order algorithmic frameworks based on the quadratic penalty method and the augmented Lagrangian method.
We provide nonasymptotic theoretical guarantees for these algorithms.
arXiv Detail & Related papers (2022-04-07T00:11:05Z) - Instance-Dependent Confidence and Early Stopping for Reinforcement
Learning [99.57168572237421]
Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure.
This research provides guarantees that explain textitex post the performance differences observed.
A natural next step is to convert these theoretical guarantees into guidelines that are useful in practice.
arXiv Detail & Related papers (2022-01-21T04:25:35Z) - Accelerated and instance-optimal policy evaluation with linear function
approximation [17.995515643150657]
Existing algorithms fail to match at least one of these lower bounds.
We develop an accelerated, variance-reduced fast temporal difference algorithm that simultaneously matches both lower bounds and attains a strong notion of instance-optimality.
arXiv Detail & Related papers (2021-12-24T17:21:04Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Sparse recovery by reduced variance stochastic approximation [5.672132510411465]
We discuss application of iterative quadratic optimization routines to the problem of sparse signal recovery from noisy observation.
We show how one can straightforwardly enhance reliability of the corresponding solution by using Median-of-Means like techniques.
arXiv Detail & Related papers (2020-06-11T12:31:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.