Fugu-MT 論文翻訳(概要): GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation

論文の概要: GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation

arxiv url: http://arxiv.org/abs/2508.14036v2
Date: Wed, 27 Aug 2025 17:10:30 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-28 12:43:57.457845
Title: GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation
Title（参考訳）: GeoSAM2: 3DパーツセグメンテーションのためのSAM2のパワーを開放する
Authors: Ken Deng, Yunhan Yang, Jingxiang Sun, Xihui Liu, Yebin Liu, Ding Liang, Yan-Pei Cao,
Abstract要約: GeoSAM2は3次元部分分割のためのプロンプト制御可能なフレームワークである。テクスチャのないオブジェクトが与えられた場合、事前に定義された視点から正規写像とポイントマップを描画する。部品の選択をガイドするシンプルな2Dプロンプト(クリックやボックス)を受け入れます。予測されたマスクはオブジェクトにバックプロジェクションされ、ビューに集約される。
参考スコア（独自算出の注目度）: 81.0871900167463
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce GeoSAM2, a prompt-controllable framework for 3D part segmentation that casts the task as multi-view 2D mask prediction. Given a textureless object, we render normal and point maps from predefined viewpoints and accept simple 2D prompts - clicks or boxes - to guide part selection. These prompts are processed by a shared SAM2 backbone augmented with LoRA and residual geometry fusion, enabling view-specific reasoning while preserving pretrained priors. The predicted masks are back-projected to the object and aggregated across views. Our method enables fine-grained, part-specific control without requiring text prompts, per-shape optimization, or full 3D labels. In contrast to global clustering or scale-based methods, prompts are explicit, spatially grounded, and interpretable. We achieve state-of-the-art class-agnostic performance on PartObjaverse-Tiny and PartNetE, outperforming both slow optimization-based pipelines and fast but coarse feedforward approaches. Our results highlight a new paradigm: aligning the paradigm of 3D segmentation with SAM2, leveraging interactive 2D inputs to unlock controllability and precision in object-level part understanding.
Abstract（参考訳）: GeoSAM2は3次元部分分割のためのプロンプト制御可能なフレームワークで,タスクを多視点2次元マスク予測として利用する。テクスチャのないオブジェクトが与えられた場合、事前に定義された視点から通常のマップとポイントマップをレンダリングし、単純な2Dプロンプト(クリックやボックス)を受け入れて部分選択をガイドします。これらのプロンプトは、LoRAと残留幾何融合で強化された共有SAM2バックボーンによって処理され、事前訓練された前兆を保存しながら、ビュー固有の推論を可能にする。予測されたマスクはオブジェクトにバックプロジェクションされ、ビューに集約される。本手法では, テキストプロンプト, 形状ごとの最適化, フル3Dラベルを必要とせず, きめ細かい制御が可能となる。グローバルクラスタリングやスケールベースの手法とは対照的に、プロンプトは明示的で、空間的に接地され、解釈可能である。我々は、高速だが粗いフィードフォワードアプローチと遅い最適化ベースのパイプラインの両方を性能良く、PartObjaverse-TinyとPartNetEで最先端のクラス非依存のパフォーマンスを達成する。本研究は,3次元セグメンテーションのパラダイムをSAM2と整合させ,対話型2次元インプットを利用してオブジェクトレベルの部分理解における制御性と精度を解放する手法である。

論文の概要: GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation

関連論文リスト