Fugu-MT 論文翻訳(概要): CompassAD: Intent-Driven 3D Affordance Grounding in Functionally Competing Objects

論文の概要: CompassAD: Intent-Driven 3D Affordance Grounding in Functionally Competing Objects

arxiv url: http://arxiv.org/abs/2604.02060v1
Date: Thu, 02 Apr 2026 13:57:01 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-03 14:21:10.835496
Title: CompassAD: Intent-Driven 3D Affordance Grounding in Functionally Competing Objects
Title（参考訳）: CompassAD: 機能的に競合するオブジェクトのインテント駆動型3D Affordance Grounding
Authors: Jingliang Li, Jindou Jia, Tuo An, Chuhao Zhou, Xiangyu Chen, Shilin Shan, Boyu Ma, Bofan Lyu, Gen Li, Jianfei Yang,
Abstract要約: 現実世界のシーンでは、複数のオブジェクトが同じ余裕を共有することがあるが、与えられたタスクコンテキストの下で適切なのは1つだけである。 Intent-Driven Instructions の下で,マルチオブジェクトアフォーマンスグラウンドを定式化する。我々は,マルチオブジェクトシーンの暗黙的意図に着目した最初のベンチマークであるCompassADを構築した。
参考スコア（独自算出の注目度）: 23.801781337960914
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When told to "cut the apple," a robot must choose the knife over nearby scissors, despite both objects affording the same cutting function. In real-world scenes, multiple objects may share identical affordances, yet only one is appropriate under the given task context. We call such cases confusing pairs. However, existing 3D affordance methods largely sidestep this challenge by evaluating isolated single objects, often with explicit category names provided in the query. We formalize Multi-Object Affordance Grounding under Intent-Driven Instructions, a new 3D affordance setting that requires predicting a per-point affordance mask on the correct object within a cluttered multi-object point cloud, conditioned on implicit natural language intent. To study this problem, we construct CompassAD, the first benchmark centered on implicit intent in confusable multi-object scenes. It comprises 30 confusing object pairs spanning 16 affordance types, 6,422 scenes, and 88K+ query-answer pairs. Furthermore, we propose CompassNet, a framework that incorporates two dedicated modules tailored to this task. Instance-bounded Cross Injection (ICI) constrains language-geometry alignment within object boundaries to prevent cross-object semantic leakage. Bi-level Contrastive Refinement (BCR) enforces discrimination at both geometric-group and point levels, sharpening distinctions between target and confusable surfaces. Extensive experiments demonstrate state-of-the-art results on both seen and unseen queries, and deployment on a robotic manipulator confirms effective transfer to real-world grasping in confusing multi-object scenes.
Abstract（参考訳）: と言われたとき、ロボットは近くのハサミよりもナイフを選ばなければならない。現実世界のシーンでは、複数のオブジェクトが同じ余裕を共有することがあるが、与えられたタスクコンテキストの下で適切なのは1つだけである。このようなケースを紛らわしいペアと呼ぶ。しかし、既存の3Dアベイランス手法は、しばしばクエリに明示的なカテゴリ名を付与して、孤立した単一のオブジェクトを評価することで、この課題を大半を踏襲している。 Intent-Driven Instructions の下では,乱雑なマルチオブジェクトポイントクラウド内の適切なオブジェクトに対して,暗黙の自然言語意図に基づいて,ポイント当たりのアベイランスマスクの予測を必要とする新しい3Dアベイランス設定であるマルチオブジェクトアフォーマンスグラウンドを定式化する。そこで本研究では,マルチオブジェクトシーンに対する暗黙的意図に着目した最初のベンチマークであるCompassADを構築した。 16種類の価格帯、6,422のシーン、88K以上のクエリ・アンサー・ペアにまたがる30の混乱したオブジェクトペアで構成されている。さらに,このタスクに適した2つの専用モジュールを組み込んだCompassNetを提案する。インスタンス境界のクロスインジェクション(ICI)は、オブジェクト境界内の言語-幾何学的アライメントを制約し、オブジェクト間のセマンティックリークを防止する。 Bi-level Contrastive Refinement (BCR) は、幾何学群と点レベルでの識別を強制し、ターゲットと凹面の区別を鋭くする。ロボットマニピュレータへの展開は、混乱した多目的シーンにおける現実世界の把握に効果的に移行することを確認する。

論文の概要: CompassAD: Intent-Driven 3D Affordance Grounding in Functionally Competing Objects

関連論文リスト