Related papers: Symmetry Teleportation for Accelerated Optimization

Symmetry Teleportation for Accelerated Optimization

URL: http://arxiv.org/abs/2205.10637v1
Date: Sat, 21 May 2022 16:39:21 GMT
Title: Symmetry Teleportation for Accelerated Optimization
Authors: Bo Zhao, Nima Dehmamy, Robin Walters, Rose Yu
Abstract summary: We study a different approach, symmetry teleportation, that allows the parameters to travel a large distance on the loss level set. We derive the loss-invariant group actions for test functions and multi-layer neural networks, and prove a necessary condition of when teleportation improves convergence rate. Experimentally, we show that teleportation improves the convergence speed of gradient descent and AdaGrad for several optimization problems including test functions, multi-layer regressions, and MNIST classification.
Score: 21.989906418276906
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing gradient-based optimization methods update the parameters locally, in a direction that minimizes the loss function. We study a different approach, symmetry teleportation, that allows the parameters to travel a large distance on the loss level set, in order to improve the convergence speed in subsequent steps. Teleportation exploits parameter space symmetries of the optimization problem and transforms parameters while keeping the loss invariant. We derive the loss-invariant group actions for test functions and multi-layer neural networks, and prove a necessary condition of when teleportation improves convergence rate. We also show that our algorithm is closely related to second order methods. Experimentally, we show that teleportation improves the convergence speed of gradient descent and AdaGrad for several optimization problems including test functions, multi-layer regressions, and MNIST classification.

Related papers

Teleportation With Null Space Gradient Projection for Optimization Acceleration [31.641252776379957]
We introduce an algorithm that projects the gradient of the teleportation objective function onto the input null space. Our approach is readily generalizable from CNNs to transformers, and potentially other advanced architectures.
arXiv Detail & Related papers (2025-02-17T02:27:16Z)
Learning Algorithm Hyperparameters for Fast Parametric Convex Optimization [2.0403774954994858]
We introduce a machine-learning framework to learn the hyper Parameters sequence of first-order methods. We show how to learn the hyper Parameters for several popular algorithms. We showcase the effectiveness of our method with many examples, including ones from control, signal processing, and machine learning.
arXiv Detail & Related papers (2024-11-24T04:58:36Z)
Level Set Teleportation: An Optimization Perspective [21.84775414778289]
We study level set teleportation, an optimization routine which tries to accelerate gradient descent (GD) While teleportation intuitively speeds-up GD via bigger steps, current objective lacks convergence theory for convex convergence. This is in sharp contrast to the standard (strongly) setting, where teleportation neither improves nor convergence.
arXiv Detail & Related papers (2024-03-05T23:16:13Z)
Over-the-Air Computation Aided Federated Learning with the Aggregation of Normalized Gradient [12.692064367193934]
Over-the-air computation is a communication-efficient solution for federated learning (FL) In such a system, iterative procedure of private loss function is updated, and then transmitted by every mobile device. To circumvent this problem, we propose to turn local gradient to be normalized one before amplifying it.
arXiv Detail & Related papers (2023-08-17T16:15:47Z)
Improving Convergence and Generalization Using Parameter Symmetries [34.863101622189944]
We show that teleporting to minima with different curvatures improves generalization, which suggests a connection between the curvature of the minimum and generalization ability. Our results showcase the versatility of teleportation and demonstrate the potential of incorporating symmetry in optimization.
arXiv Detail & Related papers (2023-05-22T18:35:42Z)
Optimization using Parallel Gradient Evaluations on Multiple Parameters [51.64614793990665]
We propose a first-order method for convex optimization, where gradients from multiple parameters can be used during each step of gradient descent. Our method uses gradients from multiple parameters in synergy to update these parameters together towards the optima.
arXiv Detail & Related papers (2023-02-06T23:39:13Z)
TiAda: A Time-scale Adaptive Algorithm for Nonconvex Minimax Optimization [24.784754071913255]
Adaptive methods have shown their ability to adjust the stepsizes on the fly in a parameter-agnostic manner. Current convergence analyses of gradient ascent for non-concave minimax problems require careful tuning of parameters.
arXiv Detail & Related papers (2022-10-31T17:05:36Z)
Implicit differentiation for fast hyperparameter selection in non-smooth convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth. We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z)
Unified Convergence Analysis for Adaptive Optimization with Moving Average Estimator [75.05106948314956]
We show that an increasing large momentum parameter for the first-order moment is sufficient for adaptive scaling. We also give insights for increasing the momentum in a stagewise manner in accordance with stagewise decreasing step size.
arXiv Detail & Related papers (2021-04-30T08:50:24Z)
Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error. Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z)
Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart for Nonconvex Optimization [73.38702974136102]
Various types of parameter restart schemes have been proposed for accelerated algorithms to facilitate their practical convergence in rates. In this paper, we propose an algorithm for solving nonsmooth problems.
arXiv Detail & Related papers (2020-02-26T16:06:27Z)
Support recovery and sup-norm convergence rates for sparse pivotal estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level. We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.