Fugu-MT 論文翻訳(概要): ExecTune: Effective Steering of Black-Box LLMs with Guide Models

論文の概要: ExecTune: Effective Steering of Black-Box LLMs with Guide Models

arxiv url: http://arxiv.org/abs/2604.09741v1
Date: Thu, 09 Apr 2026 23:27:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:15.644547
Title: ExecTune: Effective Steering of Black-Box LLMs with Guide Models
Title（参考訳）: ExecTune: ガイドモデルによるブラックボックスLCMの効果的ステアリング
Authors: Vijay Lingam, Aditya Golatkar, Anwesan Pal, Ben Vo, Narayanan Sadagopan, Alessandro Achille, Jun Huan, Anoop Deoras, Stefano Soatto,
Abstract要約: 我々は、ガイドモデルがブラックボックスコアモデルによって実行される構造化戦略を生成する、ガイドコアポリシー(GCoP)と呼ばれるシステムのクラスについて研究する。我々はGCoPを費用対効果目標として定式化し、エンドツーエンドのパフォーマンスは平均実行可能性によって管理されていることを示す。教師が指導する受入サンプリング,教師による微調整,構造対応強化学習を組み合わせた,原則的学習レシピであるExecTuneを提案する。
参考スコア（独自算出の注目度）: 81.45879384560016
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: For large language models deployed through black-box APIs, recurring inference costs often exceed one-time training costs. This motivates composed agentic systems that amortize expensive reasoning into reusable intermediate representations. We study a broad class of such systems, termed Guide-Core Policies (GCoP), in which a guide model generates a structured strategy that is executed by a black-box core model. This abstraction subsumes base, supervised, and advisor-style approaches, which differ primarily in how the guide is trained. We formalize GCoP under a cost-sensitive utility objective and show that end-to-end performance is governed by guide-averaged executability: the probability that a strategy generated by the guide can be faithfully executed by the core. Our analysis shows that existing GCoP instantiations often fail to optimize executability under deployment constraints, resulting in brittle strategies and inefficient computation. Motivated by these insights, we propose ExecTune, a principled training recipe that combines teacher-guided acceptance sampling, supervised fine-tuning, and structure-aware reinforcement learning to directly optimize syntactic validity, execution success, and cost efficiency. Across mathematical reasoning and code-generation benchmarks, GCoP with ExecTune improves accuracy by up to 9.2% over prior state-of-the-art baselines while reducing inference cost by up to 22.4%. It enables Claude Haiku 3.5 to outperform Sonnet 3.5 on both math and code tasks, and to come within 1.7% absolute accuracy of Sonnet 4 at 38% lower cost. Beyond efficiency, GCoP also supports modular adaptation by updating the guide without retraining the core.
Abstract（参考訳）: ブラックボックスAPIを通じてデプロイされる大規模な言語モデルの場合、繰り返し発生する推論コストは、しばしば1回のトレーニングコストを超える。この動機付けは、高価な推論を再利用可能な中間表現に記憶させるエージェントシステムを構成する。我々は、ガイドモデルがブラックボックスコアモデルによって実行される構造化戦略を生成する、ガイドコアポリシー(GCoP)と呼ばれる、そのようなシステムの幅広いクラスについて研究する。この抽象化は、主にガイドのトレーニング方法が異なるベース、教師付き、アドバイザスタイルのアプローチを仮定する。我々は,GCoPを費用対効果目標として定式化し,ガイドが生成した戦略がコアによって忠実に実行される確率を,平均的な実行可能性によってエンドツーエンドのパフォーマンスが支配されることを示す。我々の分析では、既存のGCoPインスタンスは、配置制約下での実行可能性の最適化に失敗することが多く、不安定な戦略と非効率な計算をもたらすことが示されている。そこで本研究では,教師が指導する受入サンプリング,教師による微調整,構造対応強化学習を組み合わせて,統語的妥当性,実行成功,コスト効率を直接最適化する原理的学習法であるExecTuneを提案する。数学的推論とコード生成のベンチマークを通じて、ExecTuneを使用したGCoPは、従来の最先端ベースラインよりも最大9.2%精度を向上し、推論コストを最大22.4%削減した。これにより、Claude Haiku 3.5は数学とコードの両方でSonnet 3.5を上回り、Sonnet 4の1.7%の絶対精度を38%の低コストで達成できる。効率性以外にも、GCoPはコアを再トレーニングすることなくガイドを更新してモジュール化もサポートしている。

論文の概要: ExecTune: Effective Steering of Black-Box LLMs with Guide Models

関連論文リスト