Related papers: Non-local Optimization: Imposing Structure on Optimization Problems by Relaxation

Non-local Optimization: Imposing Structure on Optimization Problems by Relaxation

URL: http://arxiv.org/abs/2011.06064v3
Date: Sat, 24 Jul 2021 13:20:12 GMT
Title: Non-local Optimization: Imposing Structure on Optimization Problems by Relaxation
Authors: Nils M\"uller and Tobias Glasmachers
Abstract summary: In evolutionary computation and reinforcement learning, the optimization of a function $f: Omega to mathbbR$ is often addressed through optimizing a so-called relaxation $theta in Theta mapsto mathbbE_theta(f)$ of $f$. We investigate the structure of such relaxations by means of measure theory and Fourier analysis, enabling us to shed light on the success of many associated optimization methods.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In stochastic optimization, particularly in evolutionary computation and reinforcement learning, the optimization of a function $f: \Omega \to \mathbb{R}$ is often addressed through optimizing a so-called relaxation $\theta \in \Theta \mapsto \mathbb{E}_\theta(f)$ of $f$, where $\Theta$ resembles the parameters of a family of probability measures on $\Omega$. We investigate the structure of such relaxations by means of measure theory and Fourier analysis, enabling us to shed light on the success of many associated stochastic optimization methods. The main structural traits we derive and that allow fast and reliable optimization of relaxations are the consistency of optimal values of $f$, Lipschitzness of gradients, and convexity. We emphasize settings where $f$ itself is not differentiable or convex, e.g., in the presence of (stochastic) disturbance.

Related papers

Gradient-free stochastic optimization for additive models [56.42455605591779]
We address the problem of zero-order optimization from noisy observations for an objective function satisfying the Polyak-Lojasiewicz or the strong convexity condition. We assume that the objective function has an additive structure and satisfies a higher-order smoothness property, characterized by the H"older family of functions.
arXiv Detail & Related papers (2025-03-03T23:39:08Z)
A Novel Unified Parametric Assumption for Nonconvex Optimization [53.943470475510196]
Non optimization is central to machine learning, but the general framework non convexity enables weak convergence guarantees too pessimistic compared to the other hand. We introduce a novel unified assumption in non convex algorithms.
arXiv Detail & Related papers (2025-02-17T21:25:31Z)
Optimal Rates for Robust Stochastic Convex Optimization [12.620782629498812]
We develop novel algorithms that achieve minimax-optimal excess risk (up to logarithmic factors) under the $epsilon$-contamination model. Our algorithms do not require stringent assumptions, including Lipschitz continuity and smoothness of individual sample functions. We complement our algorithmic developments with a tight information-theoretic lower bound for robust SCO.
arXiv Detail & Related papers (2024-12-15T00:52:08Z)
Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
We focus on the class of (strongly) convex $(L0)$-smooth functions and derive new convergence guarantees for several existing methods. In particular, we derive improved convergence rates for Gradient Descent with smoothnessed Gradient Clipping and for Gradient Descent with Polyak Stepsizes.
arXiv Detail & Related papers (2024-09-23T13:11:37Z)
Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
We propose an online convex optimization approach with two different levels of adaptivity. We obtain $mathcalO(log V_T)$, $mathcalO(d log V_T)$ and $hatmathcalO(sqrtV_T)$ regret bounds for strongly convex, exp-concave and convex loss functions.
arXiv Detail & Related papers (2023-07-17T09:55:35Z)
Random Function Descent [0.0]
We show that a'stochastic Taylor' to gradient descent is scalable in high yields. Specifically, we use a'stochastic Taylor' to gradient descent, which is scalable in high yields.
arXiv Detail & Related papers (2023-05-02T12:53:50Z)
Manifold Free Riemannian Optimization [4.484251538832438]
A principled framework for solving optimization problems with a smooth manifold $mathcalM$ is proposed. We use a noiseless sample set of the cost function $(x_i, y_i)in mathcalM times mathbbR$ and the intrinsic dimension of the manifold $mathcalM$.
arXiv Detail & Related papers (2022-09-07T16:19:06Z)
Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$. We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z)
Non-Convex Optimization with Certificates and Fast Rates Through Kernel Sums of Squares [68.8204255655161]
We consider potentially non- optimized approximation problems. In this paper, we propose an algorithm that achieves close to optimal a priori computational guarantees.
arXiv Detail & Related papers (2022-04-11T09:37:04Z)
Generalization in Supervised Learning Through Riemannian Contraction [4.3604518673788135]
In a supervised learning setting, we show that if an metric 0 is contracting in someian rate $lambda, it is uniformly uniformly rate with $math. The results hold for gradient and deterministic loss surfaces, in both continuous and stable $-time. They can be shown to be optimal in certain linear settings, such as over Descent$ convex or strongly convex loss surfaces.
arXiv Detail & Related papers (2022-01-17T23:08:47Z)
Implicit differentiation for fast hyperparameter selection in non-smooth convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth. We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
Stochastic Recursive Variance Reduction for Efficient Smooth Non-Convex Compositional Optimization [20.410012564838933]
compositional optimization arises in many important machine learning tasks such as value function evaluation in reinforcement learning and portfolio management. In this paper, we investigate the general compositional optimization in the general smooth non-cursive setting.
arXiv Detail & Related papers (2019-12-31T18:59:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.