Fugu-MT 論文翻訳(概要): Do Proactive Agents Really Need an LLM to Decide When to Wake and What to Anchor?

論文の概要: Do Proactive Agents Really Need an LLM to Decide When to Wake and What to Anchor?

arxiv url: http://arxiv.org/abs/2605.30152v1
Date: Thu, 28 May 2026 16:10:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-30 02:45:56.47255
Title: Do Proactive Agents Really Need an LLM to Decide When to Wake and What to Anchor?
Title（参考訳）: 覚醒にLLMが必要か? 覚醒には何が必要か
Authors: Xiaoze Liu, Ruowang Zhang, Amir H. Abdi, Michel Galley, Zhikai Chen, Siheng Xiong, Xiaoqian Wang, Jing Gao,
Abstract要約: プロアクティブエージェントは、ユーザーアクティビティをテキストとして読み出し、すべてのイベントにLDMを呼び出して行動するかどうかを判断する。常にオンの信号をテキストではなくグラフ更新として扱う。 1つのフォワードパスは、イベント毎のトリガ確率とエンタリティ毎のルーティングスコアを得る。
参考スコア（独自算出の注目度）: 36.54616681758304
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Proactive agents read user activity as text and call an LLM on every event to decide whether to act. But user activity is not natively text: it is a structured event stream of (actor, verb, object, timestamp) tuples that the operating system already maintains in graph form. Rendering the structure as text and asking an LLM to recover it is a round-trip the system never had to take. We treat the always-on signal as graph updates rather than text and use a small temporal-graph-learning (TGL) model as the encoder: one forward pass yields a per-event trigger probability and a per-entity routing score, and only the downstream agent (turning a small structured handoff into a fluent user-facing sentence) is an LLM call, invoked only when the trigger fires. TGL improves F1 on each of 14 backbones (mean +16.7, up to +46.0); in trigger-architecture comparisons, one TGL checkpoint gives the strongest trigger AUCs and the most stable deployed threshold. It runs at 11.13 ms per event on a GPU server and 13.99 ms on a consumer laptop, approximately 4--7x and 12--83x faster than every single-forward LLM-as-trigger configuration tested in each regime, with an approximately 220 MiB BF16 resident footprint deployable on-device alongside the privacy-sensitive activity stream it consumes.
Abstract（参考訳）: プロアクティブエージェントは、ユーザーアクティビティをテキストとして読み出し、すべてのイベントにLDMを呼び出して行動するかどうかを判断する。これは(アクター、動詞、オブジェクト、タイムスタンプの)構造化されたイベントストリームであり、オペレーティングシステムがすでにグラフ形式で維持している。構造をテキストとしてレンダリングし、LLMにリカバリを依頼することは、システムが取る必要のないラウンドトリップである。我々は、常時オン信号をテキストではなくグラフ更新として扱い、小さな時間グラフ学習(TGL)モデルをエンコーダとして使用し、一方のフォワードパスは、イベント毎のトリガ確率とエンタリティ毎のルーティングスコアを出力し、下流エージェント(小さな構造化されたハンドオフを流用ユーザインタフェースの文に変換する)だけが、トリガが発火した時にのみ起動されるLCM呼び出しである。 TGLは14個のバックボーン(平均+16.7、最大+46.0)のF1を改善する。 GPUサーバで1イベントあたり11.13ミリ秒、消費者向けラップトップで13.99ミリ秒、各システムでテストされた単一フォワードLCM-as-Trigger構成の約4-7倍と12-83倍の速度で動作し、約220MiB BF16の常駐フットプリントをオンデバイスでデプロイ可能で、プライバシに敏感なアクティビティストリームが消費される。

論文の概要: Do Proactive Agents Really Need an LLM to Decide When to Wake and What to Anchor?

関連論文リスト