How Free is Parameter-Free Stochastic Optimization?
- URL: http://arxiv.org/abs/2402.03126v3
- Date: Sun, 20 Oct 2024 11:36:13 GMT
- Title: How Free is Parameter-Free Stochastic Optimization?
- Authors: Amit Attia, Tomer Koren,
- Abstract summary: We study the problem of parameter-free optimization, inquiring whether, and under what conditions, do fully parameter-free methods exist.
Existing methods can only be considered partially'' parameter-free, as they require some non-trivial knowledge of the true problem parameters.
We demonstrate that a simple hyper search technique results in a fully parameter-free method that outperforms more sophisticated state-the-art algorithms.
- Score: 29.174036532175855
- License:
- Abstract: We study the problem of parameter-free stochastic optimization, inquiring whether, and under what conditions, do fully parameter-free methods exist: these are methods that achieve convergence rates competitive with optimally tuned methods, without requiring significant knowledge of the true problem parameters. Existing parameter-free methods can only be considered ``partially'' parameter-free, as they require some non-trivial knowledge of the true problem parameters, such as a bound on the stochastic gradient norms, a bound on the distance to a minimizer, etc. In the non-convex setting, we demonstrate that a simple hyperparameter search technique results in a fully parameter-free method that outperforms more sophisticated state-of-the-art algorithms. We also provide a similar result in the convex setting with access to noisy function values under mild noise assumptions. Finally, assuming only access to stochastic gradients, we establish a lower bound that renders fully parameter-free stochastic convex optimization infeasible, and provide a method which is (partially) parameter-free up to the limit indicated by our lower bound.
Related papers
- Parameter-free Clipped Gradient Descent Meets Polyak [29.764853985834403]
Gradient descent and its variants are de facto standard algorithms for training machine learning models.
We propose Inexact Polyak Stepsize, which converges to the optimal solution without any hyperparameters tuning.
We numerically validated our convergence results using a synthetic function and demonstrated the effectiveness of our proposed methods.
arXiv Detail & Related papers (2024-05-23T19:29:38Z) - Online Constraint Tightening in Stochastic Model Predictive Control: A
Regression Approach [49.056933332667114]
No analytical solutions exist for chance-constrained optimal control problems.
We propose a data-driven approach for learning the constraint-tightening parameters online during control.
Our approach yields constraint-tightening parameters that tightly satisfy the chance constraints.
arXiv Detail & Related papers (2023-10-04T16:22:02Z) - SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to
Unknown Parameters, Unbounded Gradients and Affine Variance [33.593203156666746]
We show that AdaGrad stepsizes a popular adaptive (self-tuning) method for first-order optimization.
In both the low-noise and high-regimes we find sharp rates of convergence in both the low-noise and high-regimes.
arXiv Detail & Related papers (2023-02-17T09:46:08Z) - On the Effectiveness of Parameter-Efficient Fine-Tuning [79.6302606855302]
Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks.
We show that all of the methods are actually sparse fine-tuned models and conduct a novel theoretical analysis of them.
Despite the effectiveness of sparsity grounded by our theory, it still remains an open problem of how to choose the tunable parameters.
arXiv Detail & Related papers (2022-11-28T17:41:48Z) - Automated differential equation solver based on the parametric
approximation optimization [77.34726150561087]
The article presents a method that uses an optimization algorithm to obtain a solution using the parameterized approximation.
It allows solving the wide class of equations in an automated manner without the algorithm's parameters change.
arXiv Detail & Related papers (2022-05-11T10:06:47Z) - Making SGD Parameter-Free [28.088227276891885]
Our algorithm is conceptually simple, has high-probability guarantees, and is partially adaptive to unknown gradient norms, smoothness, and strong convexity.
At the heart of our results is a novel parameter-free certificate for SGD step size choice, and a time-uniform concentration result that assumes no a-priori bounds on SGD iterates.
arXiv Detail & Related papers (2022-05-04T16:29:38Z) - STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions.
Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z) - Efficient Hyperparameter Tuning with Dynamic Accuracy Derivative-Free
Optimization [0.27074235008521236]
We apply a recent dynamic accuracy derivative-free optimization method to hyperparameter tuning.
This method allows inexact evaluations of the learning problem while retaining convergence guarantees.
We demonstrate its robustness and efficiency compared to a fixed accuracy approach.
arXiv Detail & Related papers (2020-11-06T00:59:51Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z) - Support recovery and sup-norm convergence rates for sparse pivotal
estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level.
We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.