Fugu-MT 論文翻訳(概要): Building Interactive Real-Time Agents with Asynchronous I/O and Speculative Tool Calling

論文の概要: Building Interactive Real-Time Agents with Asynchronous I/O and Speculative Tool Calling

arxiv url: http://arxiv.org/abs/2605.13360v1
Date: Wed, 13 May 2026 11:20:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-14 23:30:28.00517
Title: Building Interactive Real-Time Agents with Asynchronous I/O and Speculative Tool Calling
Title（参考訳）: 非同期I/Oと投機ツールによる対話型リアルタイムエージェントの構築
Authors: Coleman Hooper, Minwoo Kang, Suhong Moon, Nicholas Lee, Eric Wen, John Wawrzynek, Michael W. Mahoney, Yakun Sophia Shao, Amir Gholami, Kurt Keutzer,
Abstract要約: 複雑なマルチターンツール呼び出しを行うエージェントに対してもリアルタイムなインタラクションを実現することを目的としている。本稿では,Asynchronous I/Oを提案する。これはコアエージェントの理と作用のスレッドを,追加情報を待つことから切り離すものだ。また,エージェントが完全な情報を受け取っている場合のタスク実行を管理する手法として,投機的ツール呼び出しを提案する。
参考スコア（独自算出の注目度）: 64.40340291543971
License: http://creativecommons.org/licenses/by/4.0/
Abstract: There is a growing demand for agentic AI technologies for a range of downstream applications like customer service and personal assistants. For applications where the agent needs to interact with a person, real-time low-latency responsiveness is required; for example, with voice-controlled applications, under 1 second of latency is typically required for the interaction to feel seamless. However, if we want the LLM to reason and execute an agentic workflow with tool calling, this can add can add several seconds or more of latency, which is prohibitive for real-time latency-sensitive applications. In our work, we aim to enable real-time interaction even for agents with complex multi-turn tool calling. We propose Asynchronous I/O, which decouples the core agent reason-and-act thread from waiting for additional information from either the user or environment, thereby allowing for overlapping agentic processing while waiting on external delays. We also propose Speculative Tool Calling as a method to manage task execution when the agent is still unsure if it has received the full information or if additional user information may later be provided. For strong cloud models, our method can be applied out-of-the-box to existing real-time cloud APIs, providing 1.3-1.7$\times$ speedups with minor accuracy loss. To enable real-time interaction with small edge-scale models, we also present a clock-based training methodology that adapts the model to handle streaming inputs and asynchronous responses, and demonstrate a synthetic data generation strategy for SFT. Altogether, this approach provides 1.6-2.2$\times$ speedups with the Qwen2.5-3B-Instruct and Llama-3.2-3B-Instruct models across multiple tool calling benchmarks.
Abstract（参考訳）: カスタマーサービスやパーソナルアシスタントなど、さまざまなダウンストリームアプリケーションに対するエージェントAI技術に対する需要が高まっている。エージェントが人と対話する必要があるアプリケーションでは、リアルタイムの低レイテンシ応答性が要求される。例えば、音声制御アプリケーションでは、対話がシームレスに感じられるのに1秒未満のレイテンシが要求される。しかし、LCMにツールコールによるエージェントワークフローを推論して実行させたいのであれば、リアルタイムの遅延に敏感なアプリケーションでは禁止される、数秒以上のレイテンシを追加することができる。本研究は,複雑なマルチターンツール呼び出しを行うエージェントに対して,リアルタイムインタラクションを実現することを目的としている。本稿では,Asynchronous I/Oを提案する。Asynchronous I/Oは,コアエージェントの理・作用スレッドがユーザまたは環境から追加情報を待つことを分離し,外部遅延を待ちながらエージェント処理の重複を可能にする。また,エージェントが完全な情報を受け取っている場合や,追加のユーザ情報を提供する場合にも,タスク実行を管理する方法として,投機的ツール呼び出しを提案する。強力なクラウドモデルの場合、我々の手法は既存のリアルタイムクラウドAPIに最初から適用でき、1.3-1.7$\times$ Speedupsで精度を損なうことができる。また,小さなエッジスケールモデルとのリアルタイムインタラクションを実現するために,ストリーム入力や非同期応答を処理するためのクロックベースのトレーニング手法を提案し,SFTの合成データ生成戦略を実証する。このアプローチは、Qwen2.5-3B-InstructとLlama-3.2-3B-Instructモデルを複数のツール呼び出しベンチマークで1.6-2.2$\times$ Speedupを提供する。

論文の概要: Building Interactive Real-Time Agents with Asynchronous I/O and Speculative Tool Calling

関連論文リスト