Fugu-MT 論文翻訳(概要): Egocentric Co-Pilot: Web-Native Smart-Glasses Agents for Assistive Egocentric AI

論文の概要: Egocentric Co-Pilot: Web-Native Smart-Glasses Agents for Assistive Egocentric AI

arxiv url: http://arxiv.org/abs/2603.01104v1
Date: Sun, 01 Mar 2026 13:43:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-03 19:50:56.514079
Title: Egocentric Co-Pilot: Web-Native Smart-Glasses Agents for Assistive Egocentric AI
Title（参考訳）: Egocentric Co-Pilot: 補助的Egocentric AIのためのWebネイティブスマートグラスエージェント
Authors: Sicheng Yang, Yukai Huang, Weitong Cai, Shitong Sun, Fengyi Fang, You He, Yiqiao Xie, Jiankang Deng, Hang Zhang, Jifei Song, Zhensong Zhang,
Abstract要約: スマートグラス上で動作するWebネイティブなニューロシンボリックフレームワークであるEgocentric Co-Pilotを紹介します。認識、推論、Webツールのツールボックスを編成するために、LLM(Large Language Model)を使用します。 EgolifeとHD-EPICの実験は、競争力や最先端のエゴセントリックQAパフォーマンスを示している。
参考スコア（独自算出の注目度）: 56.98603185789977
License: http://creativecommons.org/licenses/by/4.0/
Abstract: What if accessing the web did not require a screen, a stable desk, or even free hands? For people navigating crowded cities, living with low vision, or experiencing cognitive overload, smart glasses coupled with AI agents could turn the web into an always-on assistive layer over daily life. We present Egocentric Co-Pilot, a web-native neuro-symbolic framework that runs on smart glasses and uses a Large Language Model (LLM) to orchestrate a toolbox of perception, reasoning, and web tools. An egocentric reasoning core combines Temporal Chain-of-Thought with Hierarchical Context Compression to support long-horizon question answering and decision support over continuous first-person video, far beyond a single model's context window. Additionally, a lightweight multimodal intent layer maps noisy speech and gaze into structured commands. We further implement and evaluate a cloud-native WebRTC pipeline integrating streaming speech, video, and control messages into a unified channel for smart glasses and browsers. In parallel, we deploy an on-premise WebSocket baseline, exposing concrete trade-offs between local inference and cloud offloading in terms of latency, mobility, and resource use. Experiments on Egolife and HD-EPIC demonstrate competitive or state-of-the-art egocentric QA performance, and a human-in-the-loop study on smart glasses shows higher task completion and user satisfaction than leading commercial baselines. Taken together, these results indicate that web-connected egocentric co-pilots can be a practical path toward more accessible, context-aware assistance in everyday life. By grounding operation in web-native communication primitives and modular, auditable tool use, Egocentric Co-Pilot offers a concrete blueprint for assistive, always-on web agents that support education, accessibility, and social inclusion for people who may benefit most from contextual, egocentric AI.
Abstract（参考訳）: もしウェブにアクセスするのにスクリーンも机も無料の手も必要ないとしたらどうだろう? 混雑した都市をナビゲートしたり、視力の低い生活をしたり、認知的過負荷を経験したりする人々にとって、スマートグラスとAIエージェントが組み合わされば、ウェブは日々の生活を常に支援する層に変わるだろう。我々は、スマートグラス上で動作するWebネイティブなニューロシンボリックフレームワークであるEgocentric Co-Pilotを紹介し、Large Language Model(LLM)を使用して、認識、推論、Webツールのツールボックスを編成する。エゴセントリックな推論コアは、Temporal Chain-of-ThoughtとHierarchical Context Compressionを組み合わせることで、単一のモデルのコンテキストウィンドウを超えて、連続した1人称ビデオよりも長い水平質問応答と意思決定サポートをサポートする。さらに、軽量なマルチモーダルインテント層がノイズの多い音声をマッピングし、構造化されたコマンドを見つめる。さらに、スマートグラスとブラウザ用の統一チャネルに、ストリーミング音声、ビデオ、制御メッセージを統合したクラウドネイティブWebRTCパイプラインを実装し、評価する。並行して、オンプレミスのWebSocketベースラインをデプロイし、レイテンシ、モビリティ、リソース使用の観点から、ローカル推論とクラウドオフロードの間の具体的なトレードオフを明らかにします。 EgolifeとHD-EPICの実験は、競争力や最先端のエゴセントリックなQAパフォーマンスを示しており、スマートグラスに関する人間-イン・ザ・ループ研究は、主要な商用ベースラインよりも高いタスク完了とユーザ満足度を示している。これらの結果は、Webに接続されたエゴセントリックなコパイロットが、日常生活においてよりアクセスしやすく、文脈に配慮した支援への実践的な道のりであることを示唆している。 Egocentric Co-Pilotは、Webネイティブなコミュニケーションプリミティブとモジュラーで監査可能なツールの使用を基盤として、コンテキスト中心のAIから最も恩恵を受ける可能性のある人々のために、教育、アクセシビリティ、ソーシャルインクルージョンをサポートする、支援的で常時オンのWebエージェントのための具体的な青写真を提供する。

論文の概要: Egocentric Co-Pilot: Web-Native Smart-Glasses Agents for Assistive Egocentric AI

関連論文リスト