Halting Time is Predictable for Large Models: A Universality Property
and Average-case Analysis
- URL: http://arxiv.org/abs/2006.04299v3
- Date: Sat, 2 Oct 2021 21:29:25 GMT
- Title: Halting Time is Predictable for Large Models: A Universality Property
and Average-case Analysis
- Authors: Courtney Paquette, Bart van Merri\"enboer, Elliot Paquette, Fabian
Pedregosa
- Abstract summary: Compared to worst-case analysis, average-case analysis is more representative of the typical behavior of an algorithm.
We show that this is not the case for a class of large-scale problems trained with first-order methods.
numerical simulations suggest this universality property holds for a more general class of algorithms and problems.
- Score: 14.863027146736696
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Average-case analysis computes the complexity of an algorithm averaged over
all possible inputs. Compared to worst-case analysis, it is more representative
of the typical behavior of an algorithm, but remains largely unexplored in
optimization. One difficulty is that the analysis can depend on the probability
distribution of the inputs to the model. However, we show that this is not the
case for a class of large-scale problems trained with first-order methods
including random least squares and one-hidden layer neural networks with random
weights. In fact, the halting time exhibits a universality property: it is
independent of the probability distribution. With this barrier for average-case
analysis removed, we provide the first explicit average-case convergence rates
showing a tighter complexity not captured by traditional worst-case analysis.
Finally, numerical simulations suggest this universality property holds for a
more general class of algorithms and problems.
Related papers
- AlphaBeta is not as good as you think: a new probabilistic model to better analyze deterministic game-solving algorithms [2.430562648940846]
We introduce a new probabilistic model that incrementally constructs game-trees using a fixed level-wise conditional distribution.<n>Our framework sheds new light on classical game-solving algorithms, offering rigorous evidence and analytical tools.
arXiv Detail & Related papers (2025-06-27T08:07:17Z) - Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models [70.07661254213181]
We propose two principled algorithms for the test-time compute of large language models.
We prove theoretically that the failure probability of one algorithm decays to zero exponentially as its test-time compute grows.
arXiv Detail & Related papers (2024-11-29T05:29:47Z) - A Note on High-Probability Analysis of Algorithms with Exponential,
Sub-Gaussian, and General Light Tails [34.465259431606405]
We show that the analysis of such an algorithm can be reduced, in a black-box manner, and with only a small loss in logarithmic factors.
This approach simultaneously applies to any light-tailed randomization, including exponential, sub-Gaussian, and more general fast-decaying distributions.
arXiv Detail & Related papers (2024-03-05T11:38:20Z) - A Provably Accurate Randomized Sampling Algorithm for Logistic Regression [2.7930955543692817]
We present a simple, randomized sampling-based algorithm for logistic regression problem.
We prove that accurate approximations can be achieved with a sample whose size is much smaller than the total number of observations.
Overall, our work sheds light on the potential of using randomized sampling approaches to efficiently approximate the estimated probabilities in logistic regression.
arXiv Detail & Related papers (2024-02-26T06:20:28Z) - Bivariate Estimation-of-Distribution Algorithms Can Find an Exponential
Number of Optima [12.009357100208353]
We propose the test function EqualBlocksOneMax (EBOM) to support the study of how optimization algorithms handle large sets of optima.
We show that EBOM behaves very similarly to a theoretically ideal model for EBOM, which samples each of the exponentially many optima with the same maximal probability.
arXiv Detail & Related papers (2023-10-06T06:32:07Z) - Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise.
In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z) - Outlier-Robust Sparse Mean Estimation for Heavy-Tailed Distributions [42.6763105645717]
Given a small number of corrupted samples, the goal is to efficiently compute a hypothesis that accurately approximates $mu$ with high probability.
Our algorithm achieves the optimal error using a number of samples scaling logarithmically with the ambient dimension.
Our analysis may be of independent interest, involving the delicate design of a (non-spectral) decomposition for positive semi-definite satisfying certain sparsity properties.
arXiv Detail & Related papers (2022-11-29T16:13:50Z) - A Novel Convergence Analysis for Algorithms of the Adam Family [105.22760323075008]
We present a generic proof of convergence for a family of Adam-style methods including Adam, AMSGrad, Adabound, etc.
Our analysis is so simple and generic that it can be leveraged to establish the convergence for solving a broader family of non- compositional optimization problems.
arXiv Detail & Related papers (2021-12-07T02:47:58Z) - Understanding Double Descent Requires a Fine-Grained Bias-Variance
Decomposition [34.235007566913396]
We describe an interpretable, symmetric decomposition of the variance into terms associated with the labels.
We find that the bias decreases monotonically with the network width, but the variance terms exhibit non-monotonic behavior.
We also analyze the strikingly rich phenomenology that arises.
arXiv Detail & Related papers (2020-11-04T21:04:02Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - SGD with shuffling: optimal rates without component convexity and large
epoch requirements [60.65928290219793]
We consider the RandomShuffle (shuffle at the beginning of each epoch) and SingleShuffle (shuffle only once)
We establish minimax optimal convergence rates of these algorithms up to poly-log factor gaps.
We further sharpen the tight convergence results for RandomShuffle by removing the drawbacks common to all prior arts.
arXiv Detail & Related papers (2020-06-12T05:00:44Z) - Stochastic Saddle-Point Optimization for Wasserstein Barycenters [69.68068088508505]
We consider the populationimation barycenter problem for random probability measures supported on a finite set of points and generated by an online stream of data.
We employ the structure of the problem and obtain a convex-concave saddle-point reformulation of this problem.
In the setting when the distribution of random probability measures is discrete, we propose an optimization algorithm and estimate its complexity.
arXiv Detail & Related papers (2020-06-11T19:40:38Z) - Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model.
We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.