Fugu-MT 論文翻訳(概要): Pro$^2$Assist: Continuous Step-Aware Proactive Assistance with Multimodal Egocentric Perception for Long-Horizon Procedural Tasks

論文の概要: Pro$^2$Assist: Continuous Step-Aware Proactive Assistance with Multimodal Egocentric Perception for Long-Horizon Procedural Tasks

arxiv url: http://arxiv.org/abs/2605.04227v1
Date: Tue, 05 May 2026 19:12:11 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-07 18:41:07.502013
Title: Pro$^2$Assist: Continuous Step-Aware Proactive Assistance with Multimodal Egocentric Perception for Long-Horizon Procedural Tasks
Title（参考訳）: Pro$2$Assist: 長距離手続きタスクに対するマルチモーダル・エゴセントリック・パーセプションを用いた連続ステップアウェア・プロアクティブ・アシスト
Authors: Lilin Xu, Bufang Yang, Siyang Jiang, Kaiwei Liu, Kaiyuan Hou, Yuang Fan, Hongkai Chen, Zhenyu Yan, Xiaofan Jiang,
Abstract要約: Pro$2$Assistは、手続きタスクのためのステップアウェアなプロアクティブアシスタントである。 Pro$2$Assistは、タイムリーなアシストを提供するために、ユーザの進化状態に関するきめ細かいタスクの進捗と理由を追跡する。 Pro$2$Assistを、公開ソースからキュレートされたデータセットと、テストベッドでARメガネで収集された実世界のデータセットを用いて評価した。
参考スコア（独自算出の注目度）: 3.0877037234777944
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Procedural tasks with multiple ordered steps are ubiquitous in daily life. Recent advances in multimodal large language models (MLLMs) have enabled personal assistants that support daily activities. However, existing systems primarily provide reactive guidance triggered by user queries, or limited proactive assistance for isolated short-term events rather than long-horizon procedural tasks. In this work, we introduce Pro$^2$Assist, a step-aware proactive assistant that continuously tracks fine-grained task progress and reasons over the user's evolving state to provide timely assistance throughout tasks. Pro$^2$Assist leverages multimodal data from augmented reality (AR) glasses to achieve motion-based perception. It then extracts step-oriented procedural context from multi-scale temporal dynamics and task-specific expert knowledge. Based on both sensory input and procedural context, Pro$^2$Assist performs continuous reasoning to infer user needs and display timely assistance on AR glasses. We evaluate Pro$^2$Assist using a dataset curated from public sources and a real-world dataset collected on our testbed with AR glasses. Extensive evaluations show that Pro$^2$Assist outperforms the best-performing baselines by over 21% in procedural action understanding accuracy, and it achieves up to 2.29x the proactive timing accuracy of baselines. A user study with 20 participants further shows that 90% find Pro$^2$Assist useful, indicating its effectiveness for real-world procedural assistance.
Abstract（参考訳）: 複数の順序のステップを持つ手続き的なタスクは、日常生活においてユビキタスである。マルチモーダル大規模言語モデル(MLLM)の最近の進歩は、日々の活動を支援するパーソナルアシスタントを可能にしている。しかし、既存のシステムでは、ユーザクエリによって引き起こされるリアクティブなガイダンスや、長期的な手続きタスクではなく、孤立した短期イベントに対する限定的なプロアクティブなアシストが主に提供されている。本稿では,段階認識型プロアクティブアシスタントPro$2$Assistを紹介する。 Pro$2$Assistは、拡張現実(AR)メガネのマルチモーダルデータを利用して、モーションベースの知覚を実現する。その後、多スケールの時間力学とタスク固有の専門知識からステップ指向の手続きコンテキストを抽出する。 Pro$2$Assistは、感覚入力と手続きコンテキストの両方に基づいて、ユーザニーズを推論し、ARメガネにタイムリーなアシストを表示するための継続的推論を行う。 Pro$2$Assistの評価には,公開資料から収集したデータセットと,テストベッドにARメガネで収集した実世界のデータセットを用いて行った。 Pro$2$Assistは、手続き的行動理解の精度が21%以上向上し、ベースラインの確率的タイミング精度が2.29倍に達することが広く評価されている。さらに,20名を対象にしたユーザ調査では,Pro$2$Assistが有効であることが90%で示され,現実の手続き支援に有効であることが示唆された。

論文の概要: Pro$^2$Assist: Continuous Step-Aware Proactive Assistance with Multimodal Egocentric Perception for Long-Horizon Procedural Tasks

関連論文リスト