SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable   Hessian Approximation
        - URL: http://arxiv.org/abs/2502.18153v1
 - Date: Tue, 25 Feb 2025 12:35:05 GMT
 - Title: SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable   Hessian Approximation
 - Authors: Dahun Shin, Dongyeop Lee, Jinseok Chung, Namhoon Lee, 
 - Abstract summary: Sassha is a novel second-order method designed to enhance generalization by explicitly reducing sharpness of the solution.<n>We provide a comprehensive set of analyses including convergence, robustness, stability, efficiency, and cost.
 - Score: 5.523554757946985
 - License: http://creativecommons.org/licenses/by/4.0/
 - Abstract:   Approximate second-order optimization methods often exhibit poorer generalization compared to first-order approaches. In this work, we look into this issue through the lens of the loss landscape and find that existing second-order methods tend to converge to sharper minima compared to SGD. In response, we propose Sassha, a novel second-order method designed to enhance generalization by explicitly reducing sharpness of the solution, while stabilizing the computation of approximate Hessians along the optimization trajectory. In fact, this sharpness minimization scheme is crafted also to accommodate lazy Hessian updates, so as to secure efficiency besides flatness. To validate its effectiveness, we conduct a wide range of standard deep learning experiments where Sassha demonstrates its outstanding generalization performance that is comparable to, and mostly better than, other methods. We provide a comprehensive set of analyses including convergence, robustness, stability, efficiency, and cost. 
 
       
      
        Related papers
        - Refining Adaptive Zeroth-Order Optimization at Ease [24.327161891577727]
This paper introduces Refined Adaptive Zeroth-Order Optimization (R-AdaZO)<n>We first show the untapped variance reduction effect of first moment estimate on ZO gradient estimation.<n>We then refine the second moment estimate based on these variance-reduced gradient estimates to better capture the geometry of the optimization landscape.
arXiv  Detail & Related papers  (2025-02-03T03:10:44Z) - Bayesian Optimization for Non-Convex Two-Stage Stochastic Optimization   Problems [2.9016548477524156]
We formulate a knowledge-gradient-based acquisition function to jointly optimize the first variables, establish a guarantee of consistency, and provide an approximation.<n>We demonstrate comparable empirical results to an alternative we formulate with fewer focuss, which alternates between the two variable types.
arXiv  Detail & Related papers  (2024-08-30T16:26:31Z) - Revisiting Scalable Hessian Diagonal Approximations for Applications in   Reinforcement Learning [6.383513606898132]
Second-order information is valuable for many applications but challenging to compute.
We introduce HesScale, an improvement over BL89, which adds negligible extra computation.
On small networks, we find that this improvement is of higher quality than all alternatives, even those with theoretical guarantees, such as unbiasedness, while being much cheaper to compute.
arXiv  Detail & Related papers  (2024-06-05T13:53:20Z) - One-Shot Safety Alignment for Large Language Models via Optimal   Dualization [64.52223677468861]
This paper presents a perspective of dualization that reduces constrained alignment to an equivalent unconstrained alignment problem.
We do so by pre-optimizing a smooth and convex dual function that has a closed form.
Our strategy leads to two practical algorithms in model-based and preference-based settings.
arXiv  Detail & Related papers  (2024-05-29T22:12:52Z) - Optimal Guarantees for Algorithmic Reproducibility and Gradient
  Complexity in Convex Optimization [55.115992622028685]
Previous work suggests that first-order methods would need to trade-off convergence rate (gradient convergence rate) for better.
We demonstrate that both optimal complexity and near-optimal convergence guarantees can be achieved for smooth convex minimization and smooth convex-concave minimax problems.
arXiv  Detail & Related papers  (2023-10-26T19:56:52Z) - Gradient constrained sharpness-aware prompt learning for vision-language
  models [99.74832984957025]
This paper targets a novel trade-off problem in generalizable prompt learning for vision-language models (VLM)
By analyzing the loss landscapes of the state-of-the-art method and vanilla Sharpness-aware Minimization (SAM) based method, we conclude that the trade-off performance correlates to both loss value and loss sharpness.
We propose a novel SAM-based method for prompt learning, denoted as Gradient Constrained Sharpness-aware Context Optimization (GCSCoOp)
arXiv  Detail & Related papers  (2023-09-14T17:13:54Z) - An Adaptive Incremental Gradient Method With Support for Non-Euclidean
  Norms [19.41328109094503]
We propose and analyze several novel adaptive variants of the popular SAGA algorithm.
We establish its convergence guarantees under general settings.
We improve the analysis of SAGA to support non-Euclidean norms.
arXiv  Detail & Related papers  (2022-04-28T09:43:07Z) - Outlier-Robust Sparse Estimation via Non-Convex Optimization [73.18654719887205]
We explore the connection between high-dimensional statistics and non-robust optimization in the presence of sparsity constraints.
We develop novel and simple optimization formulations for these problems.
As a corollary, we obtain that any first-order method that efficiently converges to station yields an efficient algorithm for these tasks.
arXiv  Detail & Related papers  (2021-09-23T17:38:24Z) - High Probability Complexity Bounds for Non-Smooth Stochastic   Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv  Detail & Related papers  (2021-06-10T17:54:21Z) - Variance Regularization for Accelerating Stochastic Optimization [14.545770519120898]
We propose a universal principle which reduces the random error accumulation by exploiting statistic information hidden in mini-batch gradients.
This is achieved by regularizing the learning-rate according to mini-batch variances.
arXiv  Detail & Related papers  (2020-08-13T15:34:01Z) - Effective Dimension Adaptive Sketching Methods for Faster Regularized
  Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching.
We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv  Detail & Related papers  (2020-06-10T15:00:09Z) - The Strength of Nesterov's Extrapolation in the Individual Convergence
  of Nonsmooth Optimization [0.0]
We prove that Nesterov's extrapolation has the strength to make the individual convergence of gradient descent methods optimal for nonsmooth problems.
We give an extension of the derived algorithms to solve regularized learning tasks with nonsmooth losses in settings.
Our method is applicable as an efficient tool for solving large-scale $l$1-regularized hinge-loss learning problems.
arXiv  Detail & Related papers  (2020-06-08T03:35:41Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.