Fugu-MT 論文翻訳(概要): AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents

論文の概要: AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents

arxiv url: http://arxiv.org/abs/2603.18429v1
Date: Thu, 19 Mar 2026 02:45:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-20 17:19:05.927196
Title: AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents
Title（参考訳）: AndroTMem:Long-Horizon GUIエージェントの相互作用軌跡からAnchored Memoryへ
Authors: Yibo Shi, Jungang Li, Linghao Zhang, Zihao Dongfang, Biao Wu, Sicheng Tao, Yibo Yan, Chenxi Qin, Weiting Liu, Zhixin Lin, Hanqian Li, Yu Huang, Song Dai, Yonghua Hei, Yue Ding, Xiang Li, Shikang Wang, Chengdong Xu, Jingqi Liu, Xueying Ma, Zhiwen Zheng, Xiaofei Zhang, Bincheng Wang, Nichen Yang, Jie Wu, Lihua Tian, Chen Li, Xuming Hu,
Abstract要約: We present AndroTMem, a diagnosis framework for anchored memory in long-horizon Android GUI agent。私たちのベンチマークであるAndroTMem-Benchは、34,473のインタラクションステップを持つ1,069のタスクからなる(タスク当たり32.1、最大65)。本稿では、因果的に連結された中間状態アンカーのコンパクトな集合として相互作用列を表すAnchored State Memory (ASM)を提案する。
参考スコア（独自算出の注目度）: 35.5648433882265
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Long-horizon GUI agents are a key step toward real-world deployment, yet effective interaction memory under prevailing paradigms remains under-explored. Replaying full interaction sequences is redundant and amplifies noise, while summaries often erase dependency-critical information and traceability. We present AndroTMem, a diagnostic framework for anchored memory in long-horizon Android GUI agents. Its core benchmark, AndroTMem-Bench, comprises 1,069 tasks with 34,473 interaction steps (avg. 32.1 per task, max. 65). We evaluate agents with TCR (Task Complete Rate), focusing on tasks whose completion requires carrying forward critical intermediate state; AndroTMem-Bench is designed to enforce strong step-to-step causal dependencies, making sparse yet essential intermediate states decisive for downstream actions and centering interaction memory in evaluation. Across open- and closed-source GUI agents, we observe a consistent pattern: as interaction sequences grow longer, performance drops are driven mainly by within-task memory failures, not isolated perception errors or local action mistakes. Guided by this diagnosis, we propose Anchored State Memory (ASM), which represents interaction sequences as a compact set of causally linked intermediate-state anchors to enable subgoal-targeted retrieval and attribution-aware decision making. Across multiple settings and 12 evaluated GUI agents, ASM consistently outperforms full-sequence replay and summary-based baselines, improving TCR by 5%-30.16% and AMS by 4.93%-24.66%, indicating that anchored, structured memory effectively mitigates the interaction-memory bottleneck in long-horizon GUI tasks. The code, benchmark, and related resources are publicly available at [https://github.com/CVC2233/AndroTMem](https://github.com/CVC2233/AndroTMem).
Abstract（参考訳）: ロングホライズンGUIエージェントは、現実世界の展開に向けた重要なステップであるが、一般的なパラダイムの下では効果的なインタラクションメモリは未探索のままである。完全なインタラクションシーケンスをリプレイすることは冗長であり、ノイズを増幅する。 We present AndroTMem, a diagnosis framework for anchored memory in long-horizon Android GUI agent。コアベンチマークであるAndroTMem-Benchは1,069のタスクと34,473のインタラクションステップで構成されている(タスク当たり32.1、最大65)。我々は,TCR (Task Complete Rate) エージェントの評価を行い,完了に必要なタスクに着目した。AndroTMem-Bench は強力なステップ・ツー・ステップの因果依存性を強制し,下流動作や相互作用メモリの集中化を決定づける。インタラクションシーケンスが長くなるにつれて、パフォーマンス低下は主にタスク内メモリ障害によるものであり、独立した認識エラーやローカルアクションミスではない。この診断で導かれたアンコール状態記憶(Anchored State Memory, ASM)は,対話シーケンスを因果的に連結された中間状態アンカーのコンパクトな集合として表現し,サブゴール目標検索と属性認識による意思決定を可能にする。複数の設定と12の評価されたGUIエージェントにより、ASMはフルシーケンスのリプレイとサマリベースのベースラインを一貫して上回り、TCRを5%-30.16%改善し、AMSを4.93%-24.66%改善した。コード、ベンチマーク、関連リソースは[https://github.com/CVC2233/AndroTMem](https://github.com/CVC2233/AndroTMem]で公開されている。

論文の概要: AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents

関連論文リスト