Functions with average smoothness: structure, algorithms, and learning
- URL: http://arxiv.org/abs/2007.06283v2
- Date: Sun, 8 Nov 2020 09:35:23 GMT
- Title: Functions with average smoothness: structure, algorithms, and learning
- Authors: Yair Ashlagi, Lee-Ad Gottlieb, Aryeh Kontorovich
- Abstract summary: We define a local slope at each point and gauge the function complexity as the average of these values.
Since the mean can be dramatically smaller than the maximum, this complexity measure can yield considerably sharper generalization bounds.
We discover a surprisingly rich and analytic structure in the function class we define.
- Score: 12.362670630646804
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We initiate a program of average smoothness analysis for efficiently learning
real-valued functions on metric spaces. Rather than using the Lipschitz
constant as the regularizer, we define a local slope at each point and gauge
the function complexity as the average of these values. Since the mean can be
dramatically smaller than the maximum, this complexity measure can yield
considerably sharper generalization bounds -- assuming that these admit a
refinement where the Lipschitz constant is replaced by our average of local
slopes.
Our first major contribution is to obtain just such distribution-sensitive
bounds. This required overcoming a number of technical challenges, perhaps the
most formidable of which was bounding the {\em empirical} covering numbers,
which can be much worse-behaved than the ambient ones. Our combinatorial
results are accompanied by efficient algorithms for smoothing the labels of the
random sample, as well as guarantees that the extension from the sample to the
whole space will continue to be, with high probability, smooth on average.
Along the way we discover a surprisingly rich combinatorial and analytic
structure in the function class we define.
Related papers
- Optimization on a Finer Scale: Bounded Local Subgradient Variation Perspective [17.5796206457693]
We study nonsmooth optimization problems under bounded local subgradient variation.
The resulting class of objective encapsulates the classes of objective based on the defined classes.
arXiv Detail & Related papers (2024-03-24T22:42:40Z) - On Computationally Efficient Learning of Exponential Family
Distributions [33.229944519289795]
We focus on the setting where the support as well as the natural parameters are appropriately bounded.
Our method achives the order-optimal sample complexity of $O(sf log(k)/alpha2)$ when tailored for node-wise-sparse random fields.
arXiv Detail & Related papers (2023-09-12T17:25:32Z) - Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization.
We provide novel excess risk bounds, both in expectation and with high probability.
Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z) - Learning Globally Smooth Functions on Manifolds [94.22412028413102]
Learning smooth functions is generally challenging, except in simple cases such as learning linear or kernel models.
This work proposes to overcome these obstacles by combining techniques from semi-infinite constrained learning and manifold regularization.
We prove that, under mild conditions, this method estimates the Lipschitz constant of the solution, learning a globally smooth solution as a byproduct.
arXiv Detail & Related papers (2022-10-01T15:45:35Z) - Sharper Rates and Flexible Framework for Nonconvex SGD with Client and
Data Sampling [64.31011847952006]
We revisit the problem of finding an approximately stationary point of the average of $n$ smooth and possibly non-color functions.
We generalize the $smallsfcolorgreen$ so that it can provably work with virtually any sampling mechanism.
We provide the most general and most accurate analysis of optimal bound in the smooth non-color regime.
arXiv Detail & Related papers (2022-06-05T21:32:33Z) - Near-Optimal High Probability Complexity Bounds for Non-Smooth
Stochastic Optimization with Heavy-Tailed Noise [63.304196997102494]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity with bounds dependence on the confidence level that is either negative-power or logarithmic.
We propose novel stepsize rules for two gradient methods with clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Stochastic Multi-level Composition Optimization Algorithms with
Level-Independent Convergence Rates [12.783783498844022]
We study smooth multi-level composition optimization problems, where the objective function is a nested composition of $T$ functions.
We show that the first algorithm, which is a generalization of citeGhaRuswan20 to the $T$ level case, can achieve a sample complexity of $mathcalO (1/epsilon$6)
This is the first time that such an online algorithm designed for the (un) multi-level setting, obtains the same sample complexity under standard assumptions.
arXiv Detail & Related papers (2020-08-24T15:57:50Z) - Stochastic Saddle-Point Optimization for Wasserstein Barycenters [69.68068088508505]
We consider the populationimation barycenter problem for random probability measures supported on a finite set of points and generated by an online stream of data.
We employ the structure of the problem and obtain a convex-concave saddle-point reformulation of this problem.
In the setting when the distribution of random probability measures is discrete, we propose an optimization algorithm and estimate its complexity.
arXiv Detail & Related papers (2020-06-11T19:40:38Z) - Stochastic Proximal Gradient Algorithm with Minibatches. Application to
Large Scale Learning Models [2.384873896423002]
We develop and analyze minibatch variants of gradient algorithm for general composite objective functions with nonsmooth components.
We provide complexity for constant and variable stepsize iteration policies obtaining that, for minibatch size $N$, after $mathcalO(frac1Nepsilon)$ $epsilon-$subity is attained in expected quadratic distance to optimal solution.
arXiv Detail & Related papers (2020-03-30T10:43:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.