Regret Bounds without Lipschitz Continuity: Online Learning with
Relative-Lipschitz Losses
- URL: http://arxiv.org/abs/2010.12033v2
- Date: Mon, 28 Dec 2020 22:02:54 GMT
- Title: Regret Bounds without Lipschitz Continuity: Online Learning with
Relative-Lipschitz Losses
- Authors: Yihan Zhou, Victor S. Portella, Mark Schmidt, Nicholas J. A. Harvey
- Abstract summary: In online convex optimization (OCO), Lipschitz continuity of the functions is commonly assumed in order to obtain sublinear regret.
In this work, we consider OCO for relative Lipschitz and relative strongly convex functions.
We show regret bounds for the follow the regularized leader algorithms and a variant of online mirror descent.
- Score: 19.554559253046225
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In online convex optimization (OCO), Lipschitz continuity of the functions is
commonly assumed in order to obtain sublinear regret. Moreover, many algorithms
have only logarithmic regret when these functions are also strongly convex.
Recently, researchers from convex optimization proposed the notions of
"relative Lipschitz continuity" and "relative strong convexity". Both of the
notions are generalizations of their classical counterparts. It has been shown
that subgradient methods in the relative setting have performance analogous to
their performance in the classical setting.
In this work, we consider OCO for relative Lipschitz and relative strongly
convex functions. We extend the known regret bounds for classical OCO
algorithms to the relative setting. Specifically, we show regret bounds for the
follow the regularized leader algorithms and a variant of online mirror
descent. Due to the generality of these methods, these results yield regret
bounds for a wide variety of OCO algorithms. Furthermore, we further extend the
results to algorithms with extra regularization such as regularized dual
averaging.
Related papers
- Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Optimization on a Finer Scale: Bounded Local Subgradient Variation Perspective [17.5796206457693]
We study nonsmooth optimization problems under bounded local subgradient variation.
The resulting class of objective encapsulates the classes of objective based on the defined classes.
arXiv Detail & Related papers (2024-03-24T22:42:40Z) - Second Order Methods for Bandit Optimization and Control [34.51425758864638]
We show that our algorithm achieves optimal (in terms of terms of convex functions that we call $kappa$-2020) regret bounds for a large class of convex functions.
We also investigate the adaptation of our second-order bandit algorithm to online convex optimization with memory.
arXiv Detail & Related papers (2024-02-14T04:03:38Z) - Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features [65.64276393443346]
The Frank-Wolfe (FW) method is a popular approach for solving optimization problems with structured constraints.
We present two new variants of the algorithms for minimization of the finite-sum gradient.
arXiv Detail & Related papers (2023-04-23T20:05:09Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Convergence rate of the (1+1)-evolution strategy on locally strongly
convex functions with lipschitz continuous gradient and their monotonic
transformations [20.666734673282498]
Evolution strategy (ES) is one of promising classes of algorithms for black-box continuous optimization.
In this study, an upper bound and a lower bound of the rate of linear convergence of the (1+1)-ES on locally $L$-strongly convex functions with $U$-Lipschitz continuous gradient are derived.
arXiv Detail & Related papers (2022-09-26T07:16:50Z) - Generalization in Supervised Learning Through Riemannian Contraction [4.3604518673788135]
In a supervised learning setting, we show that if an metric 0 is contracting in someian rate $lambda, it is uniformly uniformly rate with $math.
The results hold for gradient and deterministic loss surfaces, in both continuous and stable $-time.
They can be shown to be optimal in certain linear settings, such as over Descent$ convex or strongly convex loss surfaces.
arXiv Detail & Related papers (2022-01-17T23:08:47Z) - Cyclic Coordinate Dual Averaging with Extrapolation [22.234715500748074]
We introduce a new block coordinate method that applies to the general class of variational inequality (VI) problems with monotone operators.
The resulting convergence bounds match the optimal convergence bounds of full gradient methods.
For $m$ coordinate blocks, the resulting gradient Lipschitz constant in our bounds is never larger than a factor $sqrtm$ compared to the traditional Euclidean Lipschitz constant.
arXiv Detail & Related papers (2021-02-26T00:28:58Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - Exactly Computing the Local Lipschitz Constant of ReLU Networks [98.43114280459271]
The local Lipschitz constant of a neural network is a useful metric for robustness, generalization, and fairness evaluation.
We show strong inapproximability results for estimating Lipschitz constants of ReLU networks.
We leverage this algorithm to evaluate the tightness of competing Lipschitz estimators and the effects of regularized training on the Lipschitz constant.
arXiv Detail & Related papers (2020-03-02T22:15:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.