Fugu-MT 論文翻訳(概要): Part-Aware Open-Vocabulary 3D Affordance Grounding via Prototypical Semantic and Geometric Alignment

論文の概要: Part-Aware Open-Vocabulary 3D Affordance Grounding via Prototypical Semantic and Geometric Alignment

arxiv url: http://arxiv.org/abs/2603.17647v1
Date: Wed, 18 Mar 2026 12:07:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-19 18:32:57.686012
Title: Part-Aware Open-Vocabulary 3D Affordance Grounding via Prototypical Semantic and Geometric Alignment
Title（参考訳）: 原型的意味的・幾何学的アライメントによる部分的オープンボキャブラリ3次元グラウンドディング
Authors: Dongqiang Gou, Xuming He,
Abstract要約: インテリジェンスと人間とAIの相互作用を具体化するためには、自然言語の問題を3Dオブジェクト内の機能的に関連のある領域に接地することが不可欠である。そこで本稿では,オープンな3次元空間における意味的表現と幾何学的表現を両立させる2段階のクロスモーダルフレームワークを提案する。提案手法の有効性を,新たに導入されたベンチマークと2つの既存ベンチマークで検証し,既存手法と比較して優れた性能を示す。
参考スコア（独自算出の注目度）: 15.545435413394882
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Grounding natural language questions to functionally relevant regions in 3D objects -- termed language-driven 3D affordance grounding -- is essential for embodied intelligence and human-AI interaction. Existing methods, while progressing from label-based to language-driven approaches, still face challenges in open-vocabulary generalization, fine-grained geometric alignment, and part-level semantic consistency. To address these issues, we propose a novel two-stage cross-modal framework that enhances both semantic and geometric representations for open-vocabulary 3D affordance grounding. In the first stage, large language models generate part-aware instructions to recover missing semantics, enabling the model to link semantically similar affordances. In the second stage, we introduce two key components: Affordance Prototype Aggregation (APA), which captures cross-object geometric consistency for each affordance, and Intra-Object Relational Modeling (IORM), which refines geometric differentiation within objects to support precise semantic alignment. We validate the effectiveness of our method through extensive experiments on a newly introduced benchmark, as well as two existing benchmarks, demonstrating superior performance in comparison with existing methods.
Abstract（参考訳）: 自然言語の問題を3Dオブジェクト内の機能的に関連のある領域(言語駆動型3Dアベイランスグラウンドディングと呼ばれる)に接地することは、インテリジェンスと人間とAIの相互作用を具現化する上で不可欠である。既存の手法はラベルベースのアプローチから言語駆動アプローチへと進歩する一方で、オープン語彙の一般化、きめ細かい幾何学的アライメント、部分レベルの意味的一貫性といった課題に直面している。これらの課題に対処するため,オープンな3次元空間の空間化のための意味的表現と幾何学的表現を両立させる2段階のクロスモーダルフレームワークを提案する。最初の段階では、大きな言語モデルは、欠落したセマンティクスを回復するための部分認識命令を生成し、セマンティクスに類似した価格のリンクを可能にする。第2段階では、各アプライアンスに対する相互対象の幾何学的整合性を捉えるAffordance Prototype Aggregation (APA) と、正確な意味的アライメントをサポートするためにオブジェクト内の幾何学的分化を洗練するIntra-Object Relational Modeling (IORM) という2つの重要なコンポーネントを紹介します。提案手法の有効性を,新たに導入されたベンチマークと2つの既存ベンチマークで検証し,既存手法と比較して優れた性能を示す。

論文の概要: Part-Aware Open-Vocabulary 3D Affordance Grounding via Prototypical Semantic and Geometric Alignment

関連論文リスト