Fugu-MT 論文翻訳(概要): In-Context Distillation with Self-Consistency Cascades: A Simple, Training-Free Way to Reduce LLM Agent Costs

論文の概要: In-Context Distillation with Self-Consistency Cascades: A Simple, Training-Free Way to Reduce LLM Agent Costs

arxiv url: http://arxiv.org/abs/2512.02543v1
Date: Tue, 02 Dec 2025 09:11:05 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-03 21:04:45.796503
Title: In-Context Distillation with Self-Consistency Cascades: A Simple, Training-Free Way to Reduce LLM Agent Costs
Title（参考訳）: 自己整合性カスケードを用いたインコンテクスト蒸留 : LLM剤のコスト削減のための簡易かつ訓練不要な方法
Authors: Vishnu Sarukkai, Asanshay Gupta, James Hong, Michaël Gharbi, Kayvon Fatahalian,
Abstract要約: 微調整に伴う開発コストを発生させることなく, LLMエージェント推論コストを削減するための簡易な手法を提案する。最も重要なことは、知識蒸留のアイデアを文脈内学習環境に適応させる$textitin-context distillation$を導入することである。提案手法では,各エージェントステップで関連する教師のデモンストレーションを検索し,インコンテキストの事例として学生に提供し,ハエの教師行動の模倣を可能にする。
参考スコア（独自算出の注目度）: 15.204355975284658
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The world currently has an abundance of ideas for how to use new LLM agents, and developers seek to rapidly prototype and test new agentic designs. However, executing agents at scale using high-capacity LLMs incurs high inference costs. We propose a simple method for reducing LLM agent inference costs without incurring the development friction costs associated with LLM fine-tuning (long training cycles, optimization hyperparameter tweaking loops) or manual prompt engineering (laborious trial and error). Most importantly, we introduce $\textit{in-context distillation}$, which adapts the idea of knowledge distillation (training a low cost-student model to mimic a high-cost teacher) to an in-context learning setting. Our approach retrieves relevant teacher demonstrations at each agent step and provides them to the student as in-context examples, enabling the student to imitate teacher behavior on-the-fly. We combine in-context distillation with the established idea of $\textit{self-consistency cascades}$ to know when the trust the student. This adaptive strategy realizes the cost benefits of model specialization while preserving the productivity of working with frozen models. On the multi-step embodied reasoning benchmark ALFWorld, our method matches teacher-level accuracy at $\textbf{2.5$\times$ lower cost}$, reducing per-episode costs from \$0.059 to \$0.024. The upfront demonstration cost amortizes after just 843 episodes, yielding cumulative savings exceeding \$34,900 at deployment scale (1M episodes). On AppWorld, a complex agent benchmark requiring multi-step API workflows, we shift the Pareto frontier by achieving a $\textbf{2$\times$ cost reduction}$ at iso-accuracy. By reducing operational costs while maintaining rapid experimentation cycles with frozen models, our approach makes advanced agentic systems economically viable for a broader range of applications.
Abstract（参考訳）: 世界には現在、新しいLLMエージェントの使い方に関するアイデアが数多くあり、開発者は新しいエージェント設計のプロトタイプとテストを迅速に行おうとしている。しかし、LLMを用いた大規模エージェントの実行は、高い推論コストを発生させる。 LLMファインチューニング(長期トレーニングサイクル、最適化ハイパーパラメータ調整ループ)や手動プロンプトエンジニアリング(共同試行錯誤)に関連する開発摩擦コストを発生させることなく、LLMエージェントの推論コストを削減するための簡易な手法を提案する。より重要なのは、$\textit{in-context distillation}$は、知識蒸留(高コストの教師を模倣するために低コストの学生モデルを訓練する)の考え方を、コンテキスト内学習環境に適用するものである。提案手法は,各エージェントステップで関連する教員のデモンストレーションを検索し,その実例として学生に提供し,教師の行動の模倣を可能にする。 In-context distillation with the established idea of $\textit{self-consistency cascades}$ to know the trust the students。この適応戦略は、凍結モデルで作業する生産性を保ちながら、モデル特殊化のコスト効果を実現する。マルチステップの具体的推論ベンチマークALFWorldでは、教師レベルの精度を$\textbf{2.5$\times$ lower cost}$と一致させ、エピソードあたりのコストを0.059ドルから0.024ドルに削減した。先行デモのコストはわずか843回で償却され、累積貯蓄額は34,900ドルを超える(100万回)。多段階のAPIワークフローを必要とする複雑なエージェントベンチマークであるAppWorldでは、$\textbf{2$\times$ cost reduction}$をイソ精度で達成することで、Paretoフロンティアをシフトしています。凍結モデルによる迅速な実験サイクルを維持しつつ,運用コストを低減し,より広い範囲のアプリケーションに対して,先進的なエージェントシステムを経済的に実現可能にする。

論文の概要: In-Context Distillation with Self-Consistency Cascades: A Simple, Training-Free Way to Reduce LLM Agent Costs

関連論文リスト