Fugu-MT 論文翻訳(概要): Harnesses for Inference-Time Alignment over Execution Trajectories

論文の概要: Harnesses for Inference-Time Alignment over Execution Trajectories

arxiv url: http://arxiv.org/abs/2605.21516v1
Date: Fri, 15 May 2026 12:47:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-22 16:35:41.924639
Title: Harnesses for Inference-Time Alignment over Execution Trajectories
Title（参考訳）: 実行軌道に対する推測時間アライメントのハーネス
Authors: Boyuan Wang, Bochao Li, Minghan Wang, Yuxin Tao, Fang Kong,
Abstract要約: 推論時間軌道アライメントのレンズによるハーネス設計について検討する。この分解により、ワークフロー、再試行予算、ガイダンスによるアクションの重み付けがハーネス設計のパフォーマンス限界を形作る方法の定量化が可能になります。この理論に触発されて、有効なハーネスは部分的であることを示す: 初期ステップのみを指定し、残りの実行をエージェントに残すことで、より高いパスレートを達成することができる。
参考スコア（独自算出の注目度）: 13.182534464050695
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Harness engineering has emerged as an important inference-time technique for large language model (LLM) agents, aiming to improve long-term performance through task decomposition and guided execution. However, more elaborate harnesses are not uniformly better: increasing decomposition or guidance can sometimes improve execution, but can also reduce final task success. We study harness design through the lens of inference-time trajectory alignment. This perspective separates harness into two mechanisms: task decomposition, which structures a task into sub-goals, and guided execution, which reshapes local action distributions during execution. This decomposition allows us to quantify how workflow granularity, retry budgets, and guidance-induced action reweighting shape the performance limits of harness design. It further reveals concrete failure modes, including over-decomposition, over-pruning, and hallucinated execution. We validate these predictions through controlled synthetic experiments and real terminal agent benchmarks. Inspired by the theory, we further show that effective harnesses can be partial: specifying only the initial steps and leaving the remaining execution to agent can achieve higher pass rate than fully structured workflows.
Abstract（参考訳）: Harness Engineeringは大規模言語モデル(LLM)エージェントの重要な推論時間技術として登場し、タスク分解とガイド付き実行による長期的なパフォーマンス向上を目指している。しかし、より精巧なハーネスは、一様ではなく、分解やガイダンスの増加は、時には実行を改善するが、最終的なタスクの成功を減らすこともできる。推論時間軌道アライメントのレンズによるハーネス設計について検討する。この観点では、ハーネスを2つのメカニズムに分けている。タスクをサブゴールに構造化するタスク分解と、実行中のローカルアクション分布を再設定するガイド付き実行である。この分解により、ワークフローの粒度の定量化、予算の再試行、ガイダンスによるアクション再重み付けがハーネス設計の性能限界を形作ることができる。さらに、オーバー分解、オーバープルーニング、幻覚的実行など、具体的な障害モードを明らかにしている。制御された合成実験と実端末エージェントベンチマークを用いて,これらの予測を検証した。この理論に触発されて、有効なハーネスは部分的であることを示す: 初期ステップのみを指定し、残りの実行をエージェントに残せば、完全に構造化されたワークフローよりも高いパスレートを達成することができる。

論文の概要: Harnesses for Inference-Time Alignment over Execution Trajectories

関連論文リスト