Related papers: Zeroth-Order Sharpness-Aware Learning with Exponential Tilting

Zeroth-Order Sharpness-Aware Learning with Exponential Tilting

URL: http://arxiv.org/abs/2510.16157v1
Date: Fri, 17 Oct 2025 19:01:34 GMT
Title: Zeroth-Order Sharpness-Aware Learning with Exponential Tilting
Authors: Xuchen Gong, Tian Li,
Abstract summary: We explore new zeroth-order algorithms to solve a soft sharpness objective parameterized by a tilting parameter $t$.<n>We provide precise characterizations of the sharpness notions of the tilted SAM framework.<n>Our approach can be used as a gradient-free and memory-efficient alternative to SAM variants.
Score: 5.409688800035885
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Classic zeroth-order optimization approaches typically optimize for a smoothed version of the original function, i.e., the expected objective under randomly perturbed model parameters. This can be interpreted as encouraging the loss values in the perturbation set to be small on average. Popular sharpness-aware minimization (SAM) objectives, however, typically focus on the largest loss within the neighborhood to arrive at flat minima more effectively. In this work, we connect zeroth-order optimization (and its corresponding objectives) with SAM approaches explicitly, through an exponential tilting objective that provides a smooth transition between the average- and the max-loss formulations. We explore new zeroth-order algorithms to solve a soft SAM objective parameterized by a tilting parameter $t$. We provide precise characterizations of the sharpness notions of the tilted SAM framework. Practically, our approach can be used as a gradient-free and memory-efficient alternative to SAM variants, and it achieves better generalization compared to vanilla zeroth-order baselines on a wide range of downstream tasks, including classification, multiple choice QA, and language generation.

Related papers

Hi-ZFO: Hierarchical Zeroth- and First-Order LLM Fine-Tuning via Importance-Guided Tensor Selection [4.808936079900314]
We propose textbfHi-ZFO (textbfHierarchical textbfZeroth- and textbfFirst-textbfOrder optimization) to synergize FO gradients with ZO estimation.<n>We show that Hi-ZFO consistently achieves superior performance while significantly reducing the training time.
arXiv Detail & Related papers (2026-01-09T03:20:54Z)
Zeroth-Order Optimization Finds Flat Minima [51.41529512093436]
We show that zeroth-order optimization with the standard two-point estimator favors solutions with small trace of Hessian.<n>We further provide convergence rates of zeroth-order optimization to approximate flat minima for convex and sufficiently smooth functions.
arXiv Detail & Related papers (2025-06-05T17:59:09Z)
Scalable Min-Max Optimization via Primal-Dual Exact Pareto Optimization [66.51747366239299]
We propose a smooth variant of the min-max problem based on the augmented Lagrangian.<n>The proposed algorithm scales better with the number of objectives than subgradient-based strategies.
arXiv Detail & Related papers (2025-03-16T11:05:51Z)
LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM [13.180761892449736]
We study robust parameter-efficient fine-tuning (PEFT) techniques for large-language models (LLMs)<n>We present a new highly computationally efficient framework called AdaZo-SAM, combining Adam and Sharpness-Aware Minimization (SAM)<n>We also design a low-rank gradient optimization method named LORENZA, which is a memory-efficient version of AdaZo-SAM.
arXiv Detail & Related papers (2025-02-26T21:30:34Z)
Tilted Sharpness-Aware Minimization [24.689230137012174]
Sharpness-Aware Minimization (SAM) has been demonstrated to improve the generalization performance of over infinity Hamiltonian generalization algorithms.<n>We show that TSAM is smoother than and thus easier to optimize than it is to solve TSAM.
arXiv Detail & Related papers (2024-10-30T02:49:48Z)
Zeroth-Order Fine-Tuning of LLMs in Random Subspaces [63.10833446782114]
As language models grow in size, memory demands for backpropagation increase.<n>Zeroth-order (ZO) optimization methods offer a memory-efficient alternative.<n>In this paper, we propose Subspace Zero-order optimization to address the challenges posed by posed by high dimensionality perturbations.
arXiv Detail & Related papers (2024-10-11T17:01:43Z)
Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries. We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z)
A Universal Class of Sharpness-Aware Minimization Algorithms [57.29207151446387]
We introduce a new class of sharpness measures, leading to new sharpness-aware objective functions. We prove that these measures are textitly expressive, allowing any function of the training loss Hessian matrix to be represented by appropriate hyper and determinants.
arXiv Detail & Related papers (2024-06-06T01:52:09Z)
Self-Supervised Dataset Distillation for Transfer Learning [77.4714995131992]
We propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL) We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is textitbiased due to randomness originating from data augmentations or masking. We empirically validate the effectiveness of our method on various applications involving transfer learning.
arXiv Detail & Related papers (2023-10-10T10:48:52Z)
Sharpness-Aware Minimization for Efficiently Improving Generalization [36.87818971067698]
We introduce a novel, effective procedure for simultaneously minimizing loss value and loss sharpness. Sharpness-Aware Minimization (SAM) seeks parameters that lie in neighborhoods having uniformly low loss. We present empirical results showing that SAM improves model generalization across a variety of benchmark datasets.
arXiv Detail & Related papers (2020-10-03T19:02:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.