Related papers: Faster Adaptive Optimization via Expected Gradient Outer Product Reparameterization

Faster Adaptive Optimization via Expected Gradient Outer Product Reparameterization

URL: http://arxiv.org/abs/2502.01594v1
Date: Mon, 03 Feb 2025 18:26:35 GMT
Title: Faster Adaptive Optimization via Expected Gradient Outer Product Reparameterization
Authors: Adela DePavia, Vasileios Charisopoulos, Rebecca Willett,
Abstract summary: We show that for a broad class of functions, the sensitivity of adaptive algorithms to choice-of-basis is influenced by the decay of the EGOP matrix spectrum.
Score: 11.394969272703014
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adaptive optimization algorithms -- such as Adagrad, Adam, and their variants -- have found widespread use in machine learning, signal processing and many other settings. Several methods in this family are not rotationally equivariant, meaning that simple reparameterizations (i.e. change of basis) can drastically affect their convergence. However, their sensitivity to the choice of parameterization has not been systematically studied; it is not clear how to identify a "favorable" change of basis in which these methods perform best. In this paper we propose a reparameterization method and demonstrate both theoretically and empirically its potential to improve their convergence behavior. Our method is an orthonormal transformation based on the expected gradient outer product (EGOP) matrix, which can be approximated using either full-batch or stochastic gradient oracles. We show that for a broad class of functions, the sensitivity of adaptive algorithms to choice-of-basis is influenced by the decay of the EGOP matrix spectrum. We illustrate the potential impact of EGOP reparameterization by presenting empirical evidence and theoretical arguments that common machine learning tasks with "natural" data exhibit EGOP spectral decay.

Related papers

Continual Adaptation: Environment-Conditional Parameter Generation for Object Detection in Dynamic Scenarios [54.58186816693791]
environments constantly change over time and space, posing significant challenges for object detectors trained based on a closed-set assumption.<n>We propose a new mechanism, converting the fine-tuning process to a specific- parameter generation.<n>In particular, we first design a dual-path LoRA-based domain-aware adapter that disentangles features into domain-invariant and domain-specific components.
arXiv Detail & Related papers (2025-06-30T17:14:12Z)
Architect Your Landscape Approach (AYLA) for Optimizations in Deep Learning [0.0]
Gradient Descent (DSG) and its variants, such as ADAM, are foundational to deep learning optimization. This paper introduces AYLA, a novel optimization technique that enhances adaptability and efficiency rates.
arXiv Detail & Related papers (2025-04-02T16:31:39Z)
Adaptive Conformal Inference by Betting [51.272991377903274]
We consider the problem of adaptive conformal inference without any assumptions about the data generating process.<n>Existing approaches for adaptive conformal inference are based on optimizing the pinball loss using variants of online gradient descent.<n>We propose a different approach for adaptive conformal inference that leverages parameter-free online convex optimization techniques.
arXiv Detail & Related papers (2024-12-26T18:42:08Z)
A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning. These problems are often formalized as Bi-Level optimizations (BLO) We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z)
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values. We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO) Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z)
End-to-End Learning for Fair Multiobjective Optimization Under Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality. This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives. It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z)
Joint State Estimation and Noise Identification Based on Variational Optimization [8.536356569523127]
A novel adaptive Kalman filter method based on conjugate-computation variational inference, referred to as CVIAKF, is proposed. The effectiveness of CVIAKF is validated through synthetic and real-world datasets of maneuvering target tracking.
arXiv Detail & Related papers (2023-12-15T07:47:03Z)
Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference [47.460898983429374]
We introduce an ensemble Kalman filter (EnKF) into the non-mean-field (NMF) variational inference framework to approximate the posterior distribution of the latent states. This novel marriage between EnKF and GPSSM not only eliminates the need for extensive parameterization in learning variational distributions, but also enables an interpretable, closed-form approximation of the evidence lower bound (ELBO) We demonstrate that the resulting EnKF-aided online algorithm embodies a principled objective function by ensuring data-fitting accuracy while incorporating model regularizations to mitigate overfitting.
arXiv Detail & Related papers (2023-12-10T15:22:30Z)
Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics. Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes. We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z)
An Adaptive Alternating-direction-method-based Nonnegative Latent Factor Model [2.857044909410376]
An alternating-direction-method-based nonnegative latent factor model can perform efficient representation learning to a high-dimensional and incomplete (HDI) matrix. This paper proposes an Adaptive Alternating-direction-method-based Nonnegative Latent Factor model, whose hyper- parameter adaptation is implemented following the principle of particle swarm optimization. Empirical studies on nonnegative HDI matrices generated by industrial applications indicate that A2NLF outperforms several state-of-the-art models in terms of computational and storage efficiency, as well as maintains highly competitive estimation accuracy for an HDI matrix's missing data
arXiv Detail & Related papers (2022-04-11T03:04:26Z)
Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent [20.47598828422897]
We propose textit-Meta-Regularization, a novel approach for the adaptive choice of the learning rate in first-order descent methods. Our approach modifies the objective function by adding a regularization term, and casts the joint process parameters.
arXiv Detail & Related papers (2021-04-12T13:13:34Z)
Learning Invariant Representations using Inverse Contrastive Loss [34.93395633215398]
We introduce a class of losses for learning representations that are invariant to some extraneous variable of interest. We show that if the extraneous variable is binary, then optimizing ICL is equivalent to optimizing a regularized MMD divergence.
arXiv Detail & Related papers (2021-02-16T18:29:28Z)
Adaptive pruning-based optimization of parameterized quantum circuits [62.997667081978825]
Variisy hybrid quantum-classical algorithms are powerful tools to maximize the use of Noisy Intermediate Scale Quantum devices. We propose a strategy for such ansatze used in variational quantum algorithms, which we call "Efficient Circuit Training" (PECT) Instead of optimizing all of the ansatz parameters at once, PECT launches a sequence of variational algorithms.
arXiv Detail & Related papers (2020-10-01T18:14:11Z)
Group Equivariant Deep Reinforcement Learning [4.997686360064921]
We propose the use of Equivariant CNNs to train RL agents and study their inductive bias for transformation equivariant Q-value approximation. We demonstrate that equivariant architectures can dramatically enhance the performance and sample efficiency of RL agents in a highly symmetric environment.
arXiv Detail & Related papers (2020-07-01T02:38:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.