Fugu-MT 論文翻訳(概要): GeneralVLA-2: Geometry-Aware Reconstruction and Governed Memory for Robot Planning

論文の概要: GeneralVLA-2: Geometry-Aware Reconstruction and Governed Memory for Robot Planning

arxiv url: http://arxiv.org/abs/2606.17480v1
Date: Tue, 16 Jun 2026 03:45:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-17 17:15:32.255063
Title: GeneralVLA-2: Geometry-Aware Reconstruction and Governed Memory for Robot Planning
Title（参考訳）: GeneralVLA-2: ロボット計画のための幾何認識の再構築とゴバドメモリ
Authors: Haoyu Wang, Guoqing Ma, Zeyu Zhang, Yandong Guo, Boxin Shi, Hao Tang,
Abstract要約: GeneralVLAは、言語とRGB-Dの観察を3Dのエンドエフェクタパスに変換するための階層インターフェースを提供する。幾何誘導型MV-SAM3D再構成部であるGeoFuse-MV3Dを紹介する。我々はKnowledgeBankを長期記憶システムにアップグレードする。
参考スコア（独自算出の注目度）: 61.35267583855844
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Generalist vision-language-action systems need object-centric 3D evidence and reusable manipulation experience to plan reliable robot trajectories. GeneralVLA provides a hierarchical interface for converting language and RGB-D observations into 3D end-effector paths, but two bottlenecks remain. First, monocular SAM3D-style object reconstruction can hallucinate pose and unseen geometry, while manipulation benefits from stable object shape when calibrated multi-view observations are available. Second, the original KnowledgeBank mainly retrieves semantically similar snippets and appends new knowledge, which makes it difficult to control memory quality, conflicts, confidence, and geometric relevance. To address the first challenge, we introduce GeoFuse-MV3D, a geometry-prior-guided MV-SAM3D reconstruction branch that verifies external geometry cues with input-view masks, applies soft visual-hull support, performs axis-wise refinement, and fuses only geometry while preserving appearance. To address the second challenge, we upgrade KnowledgeBank into a governed long-term memory system with explicit quality, confidence, lifecycle, verifier, and conflict metadata, together with precision-oriented retrieval. Finally, we evaluate the reconstruction branch on GSO-30 and the memory module on Terminal-Bench 2.0 and SWE-Bench Verified; GeoFuse-MV3D improves over the MV-SAM3D baseline by reducing CD and LPIPS by 2.20% and 2.02% while increasing PSNR and SSIM by 2.36% and 1.03%, and KnowledgeBank improves over ReasoningBank by 4.53% on Terminal-Bench SR and 3.73% on SWE-Bench resolve rate, while reducing AS by 4.95% and 5.65%, respectively. Code: https://github.com/AIGeeksGroup/GeneralVLA-2. Website: https://aigeeksgroup.github.io/GeneralVLA-2.
Abstract（参考訳）: 汎用的な視覚-言語-アクションシステムは、信頼性の高いロボット軌道を計画するために、オブジェクト中心の3Dエビデンスと再利用可能な操作体験を必要とする。 GeneralVLAは、言語とRGB-Dの観察を3Dのエンドエフェクタパスに変換するための階層インターフェースを提供するが、ボトルネックは2つ残っている。第一に、単分子SAM3Dスタイルのオブジェクト再構成はポーズと見えない幾何学を幻覚させ、一方、校正されたマルチビュー観測が可能となると、安定したオブジェクト形状による操作の恩恵を受けることができる。第二に、オリジナルのKnowledgeBankは、主にセマンティックに類似したスニペットを取得し、新しい知識を追加する。最初の課題に対処するため、GeoFuse-MV3Dは、外部幾何学の手がかりを入力ビューマスクで検証し、ソフトビジュアルホールサポートを適用し、軸方向の洗練を行い、外観を保ちながら幾何学のみを融合するジオフューズ-MV3Dを導入する。第2の課題に対処するため,我々はKnowledgeBankを,精度指向の検索とともに,明確な品質,信頼性,ライフサイクル,検証,コンフリクトメタデータを備えた長期記憶システムにアップグレードする。最後に、GSO-30とSWE-Benchのメモリモジュールの再構築ブランチを評価し、GeoFuse-MV3DはCDとLPIPSを2.20%、SSIMは2.36%、SSIMは1.03%、KnowledgeBankはReasoningBankを4.53%、SWE-Benchは3.73%、ASは4.95%、SWE-Benchは5.65%減らしてMV-SAM3Dベースラインを改善する。コード:https://github.com/AIGeeksGroup/GeneralVLA-2。ウェブサイト:https://aigeeksgroup.github.io/GeneralVLA-2。

論文の概要: GeneralVLA-2: Geometry-Aware Reconstruction and Governed Memory for Robot Planning

関連論文リスト