Improving Convergence and Generalization Using Parameter Symmetries
- URL: http://arxiv.org/abs/2305.13404v3
- Date: Sat, 13 Apr 2024 18:28:52 GMT
- Title: Improving Convergence and Generalization Using Parameter Symmetries
- Authors: Bo Zhao, Robert M. Gower, Robin Walters, Rose Yu,
- Abstract summary: We show that teleporting to minima with different curvatures improves generalization, which suggests a connection between the curvature of the minimum and generalization ability.
Our results showcase the versatility of teleportation and demonstrate the potential of incorporating symmetry in optimization.
- Score: 34.863101622189944
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In many neural networks, different values of the parameters may result in the same loss value. Parameter space symmetries are loss-invariant transformations that change the model parameters. Teleportation applies such transformations to accelerate optimization. However, the exact mechanism behind this algorithm's success is not well understood. In this paper, we show that teleportation not only speeds up optimization in the short-term, but gives overall faster time to convergence. Additionally, teleporting to minima with different curvatures improves generalization, which suggests a connection between the curvature of the minimum and generalization ability. Finally, we show that integrating teleportation into a wide range of optimization algorithms and optimization-based meta-learning improves convergence. Our results showcase the versatility of teleportation and demonstrate the potential of incorporating symmetry in optimization.
Related papers
- Improving Learning to Optimize Using Parameter Symmetries [16.76912881772023]
We analyze a learning-to-optimize (L2O) algorithm that exploits parameter space symmetry to enhance efficiency.
Our results highlight the potential of leveraging neural network parameter space symmetry to advance meta-optimization.
arXiv Detail & Related papers (2025-04-21T19:03:23Z) - Transferring linearly fixed QAOA angles: performance and real device results [0.0]
We investigate a simplified approach that combines linear parameterization with parameter transferring, reducing the parameter space to just 4 dimensions regardless of the number of layers.
We compare this combined approach with standard QAOA and other parameter setting strategies such as INTERP and FOURIER, which require computationally demanding incremental layer-by-layer optimization.
Our experiments extend from classical simulation to actual quantum hardware implementation on IBM's Eagle processor, demonstrating the approach's viability on current NISQ devices.
arXiv Detail & Related papers (2025-04-17T04:17:51Z) - Teleportation With Null Space Gradient Projection for Optimization Acceleration [31.641252776379957]
We introduce an algorithm that projects the gradient of the teleportation objective function onto the input null space.
Our approach is readily generalizable from CNNs to transformers, and potentially other advanced architectures.
arXiv Detail & Related papers (2025-02-17T02:27:16Z) - Symmetry-informed transferability of optimal parameters in the Quantum Approximate Optimization Algorithm [0.0]
We show how to translate an arbitrary set of optimal parameters into an adequate domain using the symmetries.
We extend these results to general classical optimization problems described by Isatzing Hamiltonian variational ansatz for relevant physical models.
arXiv Detail & Related papers (2024-07-05T13:37:53Z) - Gradient-free neural topology optimization [0.0]
gradient-free algorithms require many more iterations to converge when compared to gradient-based algorithms.
This has made them unviable for topology optimization due to the high computational cost per iteration and high dimensionality of these problems.
We propose a pre-trained neural reparameterization strategy that leads to at least one order of magnitude decrease in iteration count when optimizing the designs in latent space.
arXiv Detail & Related papers (2024-03-07T23:00:49Z) - Reducing measurement costs by recycling the Hessian in adaptive variational quantum algorithms [0.0]
We propose an improved quasi-Newton optimization protocol specifically tailored to adaptive VQAs.
We implement a quasi-Newton algorithm where an approximation to the inverse Hessian matrix is continuously built and grown across the iterations of an adaptive VQA.
arXiv Detail & Related papers (2024-01-10T14:08:04Z) - Symmetry Teleportation for Accelerated Optimization [21.989906418276906]
We study a different approach, symmetry teleportation, that allows the parameters to travel a large distance on the loss level set.
We derive the loss-invariant group actions for test functions and multi-layer neural networks, and prove a necessary condition of when teleportation improves convergence rate.
Experimentally, we show that teleportation improves the convergence speed of gradient descent and AdaGrad for several optimization problems including test functions, multi-layer regressions, and MNIST classification.
arXiv Detail & Related papers (2022-05-21T16:39:21Z) - Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems.
Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z) - Unified Convergence Analysis for Adaptive Optimization with Moving Average Estimator [75.05106948314956]
We show that an increasing large momentum parameter for the first-order moment is sufficient for adaptive scaling.
We also give insights for increasing the momentum in a stagewise manner in accordance with stagewise decreasing step size.
arXiv Detail & Related papers (2021-04-30T08:50:24Z) - Adaptive pruning-based optimization of parameterized quantum circuits [62.997667081978825]
Variisy hybrid quantum-classical algorithms are powerful tools to maximize the use of Noisy Intermediate Scale Quantum devices.
We propose a strategy for such ansatze used in variational quantum algorithms, which we call "Efficient Circuit Training" (PECT)
Instead of optimizing all of the ansatz parameters at once, PECT launches a sequence of variational algorithms.
arXiv Detail & Related papers (2020-10-01T18:14:11Z) - Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart
for Nonconvex Optimization [73.38702974136102]
Various types of parameter restart schemes have been proposed for accelerated algorithms to facilitate their practical convergence in rates.
In this paper, we propose an algorithm for solving nonsmooth problems.
arXiv Detail & Related papers (2020-02-26T16:06:27Z) - Support recovery and sup-norm convergence rates for sparse pivotal
estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level.
We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.