Fugu-MT 論文翻訳(概要): LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

論文の概要: LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

arxiv url: http://arxiv.org/abs/2605.08083v1
Date: Fri, 08 May 2026 17:59:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-11 19:43:39.265157
Title: LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
Title（参考訳）: LLMの改善 - テスト時間スケーリングのためのエージェントディスカバリ
Authors: Tong Zheng, Haolin Liu, Chengsong Huang, Huiwen Bao, Sheng Zhang, Rui Liu, Runpeng Dai, Ruibo Chen, Chenxi Liu, Tianyi Xiong, Xidong Wu, Hongming Zhang, Heng Huang,
Abstract要約: テストタイムスケーリング(TTS)は,大規模言語モデルの性能向上に有効なアプローチとなっている。既存のTS戦略は、主に手作りであり、研究者はパターンを設計し、直感で調整し、計算割り当ての空間の多くを探索していない。環境駆動型フレームワークであるAutoTTSを提案し、研究者が設計したものを、個々のTSからTTS戦略を自動的に発見できる環境へと変更する。
参考スコア（独自算出の注目度）: 63.679448814185456
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Test-time scaling (TTS) has become an effective approach for improving large language model performance by allocating additional computation during inference. However, existing TTS strategies are largely hand-crafted: researchers manually design reasoning patterns and tune heuristics by intuition, leaving much of the computation-allocation space unexplored. We propose an environment-driven framework, AutoTTS, that changes what researchers design: from individual TTS heuristics to environments where TTS strategies can be discovered automatically. The key to AutoTTS lies in environment construction: the discovery environment must make the control space tractable and provide cheap, frequent feedback for TTS search. As a concrete instantiation, we formulate width--depth TTS as controller synthesis over pre-collected reasoning trajectories and probe signals, where controllers decide when to branch, continue, probe, prune, or stop and can be evaluated cheaply without repeated LLM calls. We further introduce beta parameterization to make the search tractable and fine-grained execution trace feedback to improve discovery efficiency by helping the agent diagnose why a TTS program fails. Experiments on mathematical reasoning benchmarks show that the discovered strategies improve the overall accuracy--cost tradeoff over strong manually designed baselines. The discovered strategies generalize to held-out benchmarks and model scales, while the entire discovery costs only $39.9 and 160 minutes. Our data, and code will be open-source at https://github.com/zhengkid/AutoTTS.
Abstract（参考訳）: テスト時間スケーリング(TTS)は、推論中にさらなる計算を割り当てることで、大規模言語モデルの性能向上に有効なアプローチとなっている。研究者は手動で推論パターンを設計し、直感でヒューリスティックをチューニングし、計算配置の空間の多くを探索していない。環境駆動型フレームワークであるAutoTTSを提案し、研究者が設計したものを、個々のTSヒューリスティックから、TS戦略を自動的に発見できる環境へと変更する。 AutoTTSの鍵は環境構築にある: 発見環境は制御空間をトラクタブルにし、TS検索に対して安価で頻繁なフィードバックを提供する必要がある。具体的なインスタンス化として、制御器がいつ分岐、継続、プローブ、プルーネ、停止するかを判断し、繰り返しLCM呼び出しなしで安価に評価できる、事前コンパイルされた推論軌跡とプローブ信号に対する制御器合成として幅深のTSSを定式化する。さらに,TTSプログラムがなぜ失敗するかをエージェントが診断するのを助けることで,検索の抽出可能かつきめ細かな実行トレースフィードバックを実現し,発見効率を向上させるために,ベータパラメータ化も導入する。数学的推論ベンチマークの実験により、発見された戦略は、強い手動設計のベースラインに対するコストのトレードオフという、全体的な正確性を改善することが示されている。発見戦略はホールドアウトベンチマークとモデルスケールに一般化され、発見全体の費用は39.9ドルと160分に過ぎなかった。私たちのデータとコードはhttps://github.com/zhengkid/AutoTTS.comでオープンソース化されます。

論文の概要: LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

関連論文リスト