Sign Operator for Coping with Heavy-Tailed Noise: High Probability Convergence Bounds with Extensions to Distributed Optimization and Comparison Oracle
- URL: http://arxiv.org/abs/2502.07923v1
- Date: Tue, 11 Feb 2025 19:54:11 GMT
- Title: Sign Operator for Coping with Heavy-Tailed Noise: High Probability Convergence Bounds with Extensions to Distributed Optimization and Comparison Oracle
- Authors: Nikita Kornilov, Philip Zmushko, Andrei Semenov, Alexander Gasnikov, Alexander Beznosikov,
- Abstract summary: We show that SignSGD achieves optimal sample complexity $tildeO(varepsilon-frac3kappa - 2kappa 1right)$ with high accuracy.
We also explore the application of the sign operator in zeroth-order optimization with an oracle that can only compare function values at two different points.
- Score: 77.3806516979843
- License:
- Abstract: The growing popularity of AI optimization problems involving severely corrupted data has increased the demand for methods capable of handling heavy-tailed noise, i.e., noise with bounded $\kappa$-th moment, $\kappa \in (1,2]$. For the widely used clipping technique, effectiveness heavily depends on the careful tuning of clipping levels throughout training. In this paper, we demonstrate that using only the sign of the input, without introducing additional hyperparameters, is sufficient to cope with heavy-tailed noise effectively. For smooth non-convex functions, we prove that SignSGD achieves optimal sample complexity $\tilde{O}\left(\varepsilon^{-\frac{3\kappa - 2}{\kappa - 1}}\right)$ with high probability for attaining an average gradient norm accuracy of $\varepsilon$. Under the assumption of symmetric noise, we use SignSGD with Majority Voting to extend this bound to the distributed optimization or reduce the sample complexity to $\tilde{O}(\varepsilon^{-4})$ in the case of a single worker with arbitrary parameters. Furthermore, we explore the application of the sign operator in zeroth-order optimization with an oracle that can only compare function values at two different points. We propose a novel method, MajorityVote-CompsSGD, and provide the first-known high-probability bound $\tilde{O}(\varepsilon^{-6})$ for the number of comparisons under symmetric noise assumption. Our theoretical findings are supported by the superior performance of sign-based methods in training Large Language Models.
Related papers
- Stochastic Halpern iteration in normed spaces and applications to reinforcement learning [0.30693357740321775]
We show that if the underlying oracle is uniformly bounded, our method exhibits an overall oracle complexity of $tildeO(varepsilon-5)$.
We propose new synchronous algorithms for average reward and discounted reward Markov decision processes.
arXiv Detail & Related papers (2024-03-19T01:07:35Z) - On Partial Optimal Transport: Revising the Infeasibility of Sinkhorn and
Efficient Gradient Methods [17.14725907264431]
This paper studies the Partial Optimal Transport (POT) problem between two unbalanced measures with at most $n$ supports.
We propose a novel rounding algorithm for POT, and then provide a feasible Sinkhorn procedure with a revised complexity of $mathcalwidetilde O(n2/varepsilon4)$.
arXiv Detail & Related papers (2023-12-21T15:56:09Z) - Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density.
We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order.
We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z) - A Specialized Semismooth Newton Method for Kernel-Based Optimal
Transport [92.96250725599958]
Kernel-based optimal transport (OT) estimators offer an alternative, functional estimation procedure to address OT problems from samples.
We show that our SSN method achieves a global convergence rate of $O (1/sqrtk)$, and a local quadratic convergence rate under standard regularity conditions.
arXiv Detail & Related papers (2023-10-21T18:48:45Z) - Accelerating Inexact HyperGradient Descent for Bilevel Optimization [84.00488779515206]
We present a method for solving general non-strongly-concave bilevel optimization problems.
Our results also improve upon the existing complexity for finding second-order stationary points in non-strongly-concave problems.
arXiv Detail & Related papers (2023-06-30T20:36:44Z) - A Fully First-Order Method for Stochastic Bilevel Optimization [8.663726907303303]
We consider unconstrained bilevel optimization problems when only the first-order gradient oracles are available.
We propose a Fully First-order Approximation (F2SA) method, and study its non-asymptotic convergence properties.
We demonstrate even superior practical performance of the proposed method over existing second-order based approaches on MNIST data-hypercleaning experiments.
arXiv Detail & Related papers (2023-01-26T05:34:21Z) - An Efficient Stochastic Algorithm for Decentralized Nonconvex-Strongly-Concave Minimax Optimization [25.00475462213752]
Decentralized Recursive desc. Method (DREAM)
Concretely, it requires $mathcalO(minminappaappa3eps-3,kappa2 N)$ first-order oracle (SFO) calls and $tildemathcalO(kappa2 epsilon-2) communication rounds.
Our numerical experiments validate the superiority of previous methods.
arXiv Detail & Related papers (2022-12-05T16:09:39Z) - Second-Order Information in Non-Convex Stochastic Optimization: Power
and Limitations [54.42518331209581]
We find an algorithm which finds.
epsilon$-approximate stationary point (with $|nabla F(x)|le epsilon$) using.
$(epsilon,gamma)$surimate random random points.
Our lower bounds here are novel even in the noiseless case.
arXiv Detail & Related papers (2020-06-24T04:41:43Z) - Learning Halfspaces with Tsybakov Noise [50.659479930171585]
We study the learnability of halfspaces in the presence of Tsybakov noise.
We give an algorithm that achieves misclassification error $epsilon$ with respect to the true halfspace.
arXiv Detail & Related papers (2020-06-11T14:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.