Fugu-MT 論文翻訳(概要): ToolSpec: Accelerating Tool Calling via Schema-Aware and Retrieval-Augmented Speculative Decoding

論文の概要: ToolSpec: Accelerating Tool Calling via Schema-Aware and Retrieval-Augmented Speculative Decoding

arxiv url: http://arxiv.org/abs/2604.13519v1
Date: Wed, 15 Apr 2026 06:05:11 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-16 20:38:32.406757
Title: ToolSpec: Accelerating Tool Calling via Schema-Aware and Retrieval-Augmented Speculative Decoding
Title（参考訳）: ToolSpec: Schema-AwareとRetrieval-Augmented Speculative Decodingによるツール呼び出しの高速化
Authors: Heming Xia, Yongqi Li, Cunxiao Du, Mingbo Song, Wenjie Li,
Abstract要約: ツールコールトレースは高度に構造化されており、制約されたスキーマに準拠しており、しばしば繰り返し行われる呼び出しパターンを示す。本稿では,ツール呼び出しを高速化する,スキーマ対応の検索拡張型投機的復号法であるToolSpecを提案する。テストの結果、ToolSpecは4.2倍のスピードアップを達成した。
参考スコア（独自算出の注目度）: 18.39543649458034
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tool calling has greatly expanded the practical utility of large language models (LLMs) by enabling them to interact with external applications. As LLM capabilities advance, effective tool use increasingly involves multi-step, multi-turn interactions to solve complex tasks. However, the resulting growth in tool interactions incurs substantial latency, posing a key challenge for real-time LLM serving. Through empirical analysis, we find that tool-calling traces are highly structured, conform to constrained schemas, and often exhibit recurring invocation patterns. Motivated by this, we propose ToolSpec, a schema-aware, retrieval-augmented speculative decoding method for accelerating tool calling. ToolSpec exploits predefined tool schemas to generate accurate drafts, using a finite-state machine to alternate between deterministic schema token filling and speculative generation for variable fields. In addition, ToolSpec retrieves similar historical tool invocations and reuses them as drafts to further improve efficiency. ToolSpec presents a plug-and-play solution that can be seamlessly integrated into existing LLM workflows. Experiments across multiple benchmarks demonstrate that ToolSpec achieves up to a 4.2x speedup, substantially outperforming existing training-free speculative decoding methods.
Abstract（参考訳）: ツールコールは、外部アプリケーションとの対話を可能にすることで、大規模言語モデル(LLM)の実用性を大幅に拡張した。 LLMの能力が向上するにつれて、効率的なツールの使用には複雑なタスクを解決するための多段階のマルチターン相互作用がますます伴う。しかし、ツールのインタラクションが成長すると、かなりの遅延が発生し、リアルタイムのLLMサービスにとって重要な課題が浮かび上がっている。経験的分析により,ツールコールトレースは高度に構造化され,制約付きスキーマに適合し,頻繁な呼び出しパターンを示すことが判明した。そこで本研究では,ツール呼び出しの高速化を目的とした,スキーマ対応で拡張された投機的デコーディング手法であるToolSpecを提案する。 ToolSpecは、定義済みのツールスキーマを利用して正確なドラフトを生成する。有限状態マシンを使用して、決定論的スキーマトークンフィリングと可変フィールドの投機生成を切り替える。さらに、ToolSpecは同様の過去のツール呼び出しを回収し、それらをドラフトとして再利用して効率をさらに向上する。 ToolSpecは既存のLLMワークフローにシームレスに統合可能なプラグイン・アンド・プレイソリューションを提供する。複数のベンチマークでの実験では、ToolSpecは4.2倍のスピードアップを達成し、既存のトレーニング不要な投機的復号法を大幅に上回っている。

論文の概要: ToolSpec: Accelerating Tool Calling via Schema-Aware and Retrieval-Augmented Speculative Decoding

関連論文リスト