Learning a Zeroth-Order Optimizer for Fine-Tuning LLMs
- URL: http://arxiv.org/abs/2510.00419v1
- Date: Wed, 01 Oct 2025 02:01:07 GMT
- Title: Learning a Zeroth-Order Optimizer for Fine-Tuning LLMs
- Authors: Kairun Zhang, Haoyu Li, Yanjun Zhao, Yifan Sun, Huan Zhang,
- Abstract summary: We propose ZO Fine-tuner, a learning-based zeroth-order for large language models.<n>It automatically learns efficient perturbation strategies through a compact and memory-efficient design.<n>Experiments show that ZO Fine-tuner outperforms prior zeroth-order baselines in 82.1% of task-model combinations.
- Score: 22.39397810186991
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Zeroth-order optimizers have recently emerged as a practical approach for fine-tuning large language models (LLMs), significantly reducing GPU memory consumption compared to traditional first-order methods. Yet, existing zeroth-order methods rely on hand-crafted, static sampling strategies that are not adaptable to model-specific structures. To address this, we propose ZO Fine-tuner, a learning-based zeroth-order optimizer for LLMs that automatically learns efficient perturbation strategies through a compact and memory-efficient design. Crucially, our approach is motivated by the observation that only a small number of foundation models and their derivatives are widely adopted in practice. Therefore, learning the optimizer once for a given LLM and reusing it across diverse downstream tasks is both feasible and highly desirable. Accordingly, ZO Fine-tuner is designed to scale learning to learn (L2L) to the foundation-model era by supporting one-time training per LLM with minimal overhead. Experiments on 4 LLMs and 7 datasets show that ZO Fine-tuner outperforms prior zeroth-order baselines in 82.1\% of task-model combinations, thereby demonstrating strong performance and scalability for efficient LLM fine-tuning. Our code is available at https://github.com/ASTRAL-Group/ZO_Fine_tuner.git.
Related papers
- Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches [0.0]
We explore strategies to fine-tune decoder-only Large Language Models (LLMs) for downstream text classification under resource constraints.<n>Two approaches are investigated: (1) attaching a classification head to a pre-trained causal LLM and fine-tuning on the task, and (2) instruction-tuning the LLM in a prompt->response format for classification.
arXiv Detail & Related papers (2025-12-14T13:02:06Z) - Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning [16.095629872564874]
Reinforcement learning is arguably the most prominent fine-tuning method.<n>Evolution strategies (ES) once showed comparable performance to RL on models with a few million parameters.<n>ES can search efficiently over billions of parameters and outperform existing RL fine-tuning methods in multiple respects.
arXiv Detail & Related papers (2025-09-29T07:19:34Z) - Shadow-FT: Tuning Instruct Model via Training on Paired Base Model [67.20706292627106]
Large language models (LLMs) consistently benefit from further fine-tuning on various tasks.<n>We propose a novel Shadow-FT framework to tune the Instruct models by leveraging the corresponding Base models.<n>Our proposed Shadow-FT introduces no additional parameters, is easy to implement, and significantly improves performance.
arXiv Detail & Related papers (2025-05-19T05:16:21Z) - Visualising Policy-Reward Interplay to Inform Zeroth-Order Preference Optimisation of Large Language Models [0.36326779753373206]
Zeroth-Order (ZO) optimisation uses function evaluations instead of gradients, reducing memory usage, but suffers from slow convergence in high-dimensional models.<n>We introduce ZOPrO, a novel ZO algorithm designed for Preference optimisation in Large Language Models.<n>We demonstrate that our method consistently enhances reward signals while achieving convergence times comparable to first-order methods.
arXiv Detail & Related papers (2025-03-05T12:49:48Z) - Bilevel ZOFO: Bridging Parameter-Efficient and Zeroth-Order Techniques for Efficient LLM Fine-Tuning and Meta-Training [44.48966200270378]
Fine-tuning pre-trained Large Language Models (LLMs) for downstream tasks using First-Order (FO)imats presents significant computational challenges.<n>We propose a bilevel optimization framework that complements ZO methods with PEFT to mitigate sensitivity to hard prompts.<n>Our Bilevel ZOFO method employs a double-loop optimization strategy, where only the gradient of the PEFT model and the forward pass of the base model are required.
arXiv Detail & Related papers (2025-02-05T20:47:44Z) - When Do LLMs Help With Node Classification? A Comprehensive Analysis [21.120619437937382]
We develop a comprehensive and testbed for node classification using Large Language Models (LLMs)<n>It includes 10 homophilic datasets, 4 heterophilic datasets, 8 LLM-based algorithms, 8 classic baselines, and 3 learning paradigms.<n>Our findings uncover 8 insights, e.g., (1) LLM-based methods can significantly outperform traditional methods in a semi-supervised setting, while the advantage is marginal in a supervised setting.
arXiv Detail & Related papers (2025-02-02T15:56:05Z) - Zeroth-Order Fine-Tuning of LLMs in Random Subspaces [63.10833446782114]
As language models grow in size, memory demands for backpropagation increase.<n>Zeroth-order (ZO) optimization methods offer a memory-efficient alternative.<n>In this paper, we propose Subspace Zero-order optimization to address the challenges posed by posed by high dimensionality perturbations.
arXiv Detail & Related papers (2024-10-11T17:01:43Z) - Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark [166.40879020706151]
This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during fine-tuning.
Unlike traditional ZO-SGD methods, our work expands the exploration to a wider array of ZO optimization techniques.
Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance.
arXiv Detail & Related papers (2024-02-18T14:08:48Z) - Evaluating Instruction-Tuned Large Language Models on Code Comprehension
and Generation [4.310519298899164]
In this work, we evaluate 10 open-source instructed LLMs on four representative code comprehension and generation tasks.
For the zero-shot setting, instructed LLMs are very competitive on code comprehension and generation tasks.
For the few-shot setting, we find that adding demonstration examples substantially helps instructed LLMs perform better.
arXiv Detail & Related papers (2023-08-02T15:54:22Z) - Fine-Tuning Language Models with Just Forward Passes [92.04219196752007]
Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a large amount of memory.
We propose a memory-efficient zerothorder (MeZO) to operate in-place, thereby fine-tuning LMs with the same memory footprint as inference.
arXiv Detail & Related papers (2023-05-27T02:28:10Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z) - Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language
Transfer Learning [59.38343286807997]
We propose Model-Agnostic Multitask Fine-tuning (MAMF) for vision-language models on unseen tasks.
Compared with model-agnostic meta-learning (MAML), MAMF discards the bi-level optimization and uses only first-order gradients.
We show that MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets.
arXiv Detail & Related papers (2022-03-09T17:26:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.