Related papers: Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search

Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search

URL: http://arxiv.org/abs/2603.01692v1
Date: Mon, 02 Mar 2026 10:22:47 GMT
Title: Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search
Authors: Yifei Zhang, Xu Yang, Xiao Yang, Bowen Xian, Qizheng Li, Shikai Fang, Jingyuan Li, Jian Wang, Mingrui Xu, Weiqing Liu, Jiang Bian,
Abstract summary: We introduce textscGome, an MLE agent that operationalizes gradient-based optimization.<n>textscGome achieves a state-of-the-art 35.1% any-medal rate on MLE-Bench with a restricted 12-hour budget on a single V100 GPU.
Score: 29.662680998899294
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: LLM-based agents for machine learning engineering (MLE) predominantly rely on tree search, a form of gradient-free optimization that uses scalar validation scores to rank candidates. As LLM reasoning capabilities improve, exhaustive enumeration becomes increasingly inefficient compared to directed updates, analogous to how accurate gradients enable efficient descent over random search. We introduce \textsc{Gome}, an MLE agent that operationalizes gradient-based optimization. \textsc{Gome} maps structured diagnostic reasoning to gradient computation, success memory to momentum, and multi-trace execution to distributed optimization. Under a closed-world protocol that isolates architectural effects from external knowledge, \textsc{Gome} achieves a state-of-the-art 35.1\% any-medal rate on MLE-Bench with a restricted 12-hour budget on a single V100 GPU. Scaling experiments across 10 models reveal a critical crossover: with weaker models, tree search retains advantages by compensating for unreliable reasoning through exhaustive exploration; as reasoning capability strengthens, gradient-based optimization progressively outperforms, with the gap widening at frontier-tier models. Given the rapid advancement of reasoning-oriented LLMs, this positions gradient-based optimization as an increasingly favorable paradigm. We release our codebase and GPT-5 traces.

Related papers

$\ abla$-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space [71.23672814629448]
$nabla$-Reasoner is an iterative generation framework that integrates differentiable optimization over token logits into the decoding loop.<n>$nabla$-Reasoner achieves over 20% accuracy improvement on a challenging mathematical reasoning benchmark.
arXiv Detail & Related papers (2026-03-05T08:42:54Z)
Hi-ZFO: Hierarchical Zeroth- and First-Order LLM Fine-Tuning via Importance-Guided Tensor Selection [4.808936079900314]
We propose textbfHi-ZFO (textbfHierarchical textbfZeroth- and textbfFirst-textbfOrder optimization) to synergize FO gradients with ZO estimation.<n>We show that Hi-ZFO consistently achieves superior performance while significantly reducing the training time.
arXiv Detail & Related papers (2026-01-09T03:20:54Z)
Backward-Friendly Optimization: Training Large Language Models with Approximate Gradients under Memory Constraints [14.20716202034732]
Full fine-tuning of Large Language Models (LLMs) is notoriously memory-intensive.<n>We introduce GradLite, a backward-friendly solution that relaxes the requirement of exact gradients.<n>We show that GradLite maintains unbiased estimates with bounded variance, ensuring convergence rates comparable to Adam.
arXiv Detail & Related papers (2025-10-26T00:50:12Z)
Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models [68.96619605651155]
Large reasoning models (LRMs) may drastically increase the output length due to overthinking.<n>We propose a dynamic optimization framework that segments model-generated reasoning paths into distinct thinking patterns.<n>Our method achieves up to a 12% accuracy improvement and reducing token usage from approximately 5,000 to 3,000 tokens.
arXiv Detail & Related papers (2025-05-27T20:59:29Z)
Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs [75.11449420928139]
Fine-tuning Large Language Models (LLMs) has become a crucial technique for adapting pre-trained models to downstream tasks. Low-Rank Adaptation (LoRA) has emerged as a promising solution, but there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum. We propose eXtreme Gradient Boosting LoRA, a novel framework that bridges this gap by leveraging the power of ensemble learning.
arXiv Detail & Related papers (2024-10-25T17:07:13Z)
Zeroth-Order Fine-Tuning of LLMs in Random Subspaces [63.10833446782114]
As language models grow in size, memory demands for backpropagation increase.<n>Zeroth-order (ZO) optimization methods offer a memory-efficient alternative.<n>In this paper, we propose Subspace Zero-order optimization to address the challenges posed by posed by high dimensionality perturbations.
arXiv Detail & Related papers (2024-10-11T17:01:43Z)
LLM as a Complementary Optimizer to Gradient Descent: A Case Study in Prompt Tuning [69.95292905263393]
We show that gradient-based and high-level LLMs can effectively collaborate a combined optimization framework.<n>In this paper, we show that these complementary to each other and can effectively collaborate a combined optimization framework.
arXiv Detail & Related papers (2024-05-30T06:24:14Z)
SHOT: Suppressing the Hessian along the Optimization Trajectory for Gradient-Based Meta-Learning [28.26143547479141]
We introduce an algorithm called SHOT (Suppressing the Hessian along the Optimization Trajectory) SHOT does not increase the computational complexity of the baseline model much. We confirm our hypothesis empirically and demonstrate that SHOT outperforms the corresponding baseline.
arXiv Detail & Related papers (2023-10-04T11:43:08Z)
Tom: Leveraging trend of the observed gradients for faster convergence [0.0]
Tom is a novel variant of Adam that takes into account the trend observed for the gradients in the landscape in the loss traversed by the neural network. Tom outperforms Adagrad, Adadelta, RMSProp and Adam in terms of both accuracy and has a faster convergence.
arXiv Detail & Related papers (2021-09-07T20:19:40Z)
A Primer on Zeroth-Order Optimization in Signal Processing and Machine Learning [95.85269649177336]
ZO optimization iteratively performs three major steps: gradient estimation, descent direction, and solution update. We demonstrate promising applications of ZO optimization, such as evaluating and generating explanations from black-box deep learning models, and efficient online sensor management.
arXiv Detail & Related papers (2020-06-11T06:50:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.