Fugu-MT 論文翻訳(概要): FAST-EQA: Efficient Embodied Question Answering with Global and Local Region Relevancy

論文の概要: FAST-EQA: Efficient Embodied Question Answering with Global and Local Region Relevancy

arxiv url: http://arxiv.org/abs/2602.15813v1
Date: Tue, 17 Feb 2026 18:49:43 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:41.516978
Title: FAST-EQA: Efficient Embodied Question Answering with Global and Local Region Relevancy
Title（参考訳）: FAST-EQA:グローバルおよび地域関連性を考慮した効率的な身体的質問応答
Authors: Haochen Zhang, Nirav Savaliya, Faizan Siddiqui, Enna Sachdeva,
Abstract要約: EQA(Embodied Question Answering)は、視覚的シーン理解、ゴール指向探索、空間的および時間的推論を部分的に観察可能である。 FAST-EQA は (i) 視覚的対象を識別し, (ii) ナビゲーションをガイドするためのグローバルな関心領域をスコアし, (iii) 視覚記憶を推論して自信を持って回答するフレームワークである。
参考スコア（独自算出の注目度）: 5.072152236331295
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Embodied Question Answering (EQA) combines visual scene understanding, goal-directed exploration, spatial and temporal reasoning under partial observability. A central challenge is to confine physical search to question-relevant subspaces while maintaining a compact, actionable memory of observations. Furthermore, for real-world deployment, fast inference time during exploration is crucial. We introduce FAST-EQA, a question-conditioned framework that (i) identifies likely visual targets, (ii) scores global regions of interest to guide navigation, and (iii) employs Chain-of-Thought (CoT) reasoning over visual memory to answer confidently. FAST-EQA maintains a bounded scene memory that stores a fixed-capacity set of region-target hypotheses and updates them online, enabling robust handling of both single and multi-target questions without unbounded growth. To expand coverage efficiently, a global exploration policy treats narrow openings and doors as high-value frontiers, complementing local target seeking with minimal computation. Together, these components focus the agent's attention, improve scene coverage, and improve answer reliability while running substantially faster than prior approaches. On HMEQA and EXPRESS-Bench, FAST-EQA achieves state-of-the-art performance, while performing competitively on OpenEQA and MT-HM3D.
Abstract（参考訳）: EQA(Embodied Question Answering)は、視覚的シーン理解、ゴール指向探索、空間的および時間的推論を部分的に観察可能である。中心的な課題は、コンパクトで行動可能な観測メモリを維持しながら、質問関連部分空間への物理探索を限定することである。さらに、現実世界のデプロイメントでは、探索中の高速な推論時間が非常に重要です。質問条件付きフレームワークであるFAST-EQAを紹介する。 (i)潜在的な視覚的標的を特定する。 (二)航法案内の国際的関心領域を採点し、 (iii) 視覚記憶を推論するChain-of-Thought(CoT)を用いて、自信を持って答える。 FAST-EQAは、領域目標仮説の固定容量セットを格納した境界シーンメモリを維持し、それらをオンラインで更新し、無制限な成長なしにシングルとマルチターゲットの質問の堅牢なハンドリングを可能にする。カバー範囲を効率的に拡大するため、グローバルな探索ポリシでは、狭い開口部とドアを高価値フロンティアとして扱い、最小限の計算でローカルターゲットの探索を補完する。これらのコンポーネントは、エージェントの注意を集中し、シーンカバレッジを改善し、回答の信頼性を改善しながら、以前のアプローチよりもかなり高速に動作します。 HMEQAとEXPRESS-Benchでは、FAST-EQAは最先端のパフォーマンスを実現し、OpenEQAとMT-HM3Dで競合する。

論文の概要: FAST-EQA: Efficient Embodied Question Answering with Global and Local Region Relevancy

関連論文リスト