Fugu-MT 論文翻訳(概要): Towards Foundation Models for 3D Scene Understanding: Instance-Aware Self-Supervised Learning for Point Clouds

論文の概要: Towards Foundation Models for 3D Scene Understanding: Instance-Aware Self-Supervised Learning for Point Clouds

arxiv url: http://arxiv.org/abs/2603.25165v1
Date: Thu, 26 Mar 2026 08:31:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-27 20:52:48.184682
Title: Towards Foundation Models for 3D Scene Understanding: Instance-Aware Self-Supervised Learning for Point Clouds
Title（参考訳）: 3次元シーン理解のための基礎モデルに向けて:ポイントクラウドのためのインスタンス対応自己監視学習
Authors: Bin Yang, Mohamed Abdelsamad, Miao Zhang, Alexandru Paul Condurache,
Abstract要約: PointINSは、幾何学的学習を通じてポイントクラウド表現を豊かにする、インスタンス指向の自己組織化フレームワークである。 PointINSは、屋内のインスタンスセグメンテーションで平均+3.5%のmAP改善、屋外のパン光学セグメンテーションで+4.1%のPQゲインを達成している。
参考スコア（独自算出の注目度）: 53.82500407523346
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in self-supervised learning (SSL) for point clouds have substantially improved 3D scene understanding without human annotations. Existing approaches emphasize semantic awareness by enforcing feature consistency across augmented views or by masked scene modeling. However, the resulting representations transfer poorly to instance localization, and often require full finetuning for strong performance. Instance awareness is a fundamental component of 3D perception, thus bridging this gap is crucial for progressing toward true 3D foundation models that support all downstream tasks on 3D data. In this work, we introduce PointINS, an instance-oriented self-supervised framework that enriches point cloud representations through geometry-aware learning. PointINS employs an orthogonal offset branch to jointly learn high-level semantic understanding and geometric reasoning, yielding instance awareness. We identify two consistent properties essential for robust instance localization and formulate them as complementary regularization strategies, Offset Distribution Regularization (ODR), which aligns predicted offsets with empirically observed geometric priors, and Spatial Clustering Regularization (SCR), which enforces local coherence by regularizing offsets with pseudo-instance masks. Through extensive experiments across five datasets, PointINS achieves on average +3.5% mAP improvement for indoor instance segmentation and +4.1% PQ gain for outdoor panoptic segmentation, paving the way for scalable 3D foundation models.
Abstract（参考訳）: ポイントクラウドにおける自己教師あり学習(SSL)の最近の進歩は、人間のアノテーションなしでの3Dシーン理解を大幅に改善した。既存のアプローチは、拡張ビューにまたがる特徴一貫性を強制したり、マスキングシーンモデリングによってセマンティックな認識を強調する。しかし、結果の表現はインスタンスのローカライゼーションにあまり依存せず、しばしば強いパフォーマンスのために完全な微調整を必要とする。インスタンス認識は3D知覚の基本的な構成要素であり、このギャップを埋めることは、3Dデータ上のすべての下流タスクをサポートする真の3Dファンデーションモデルに向けて進む上で不可欠である。本稿では,幾何学的学習を通じてポイントクラウド表現を充実させる,インスタンス指向の自己教師型フレームワークであるPointINSを紹介する。 PointINSは、高レベルの意味理解と幾何学的推論を共同で学習するために直交オフセットブランチを使用している。我々は、ロバストなインスタンスローカライゼーションに不可欠な2つの一貫した特性を特定し、それらを相補的な正規化戦略として定式化し、予測されたオフセットを経験的に観察された幾何学的先行値と整列するオフセット分布正規化(ODR)と、擬似インスタンスマスクでオフセットを正則化することによって局所的コヒーレンスを強制する空間クラスタリング正規化(SCR)とを定式化する。 5つのデータセットにわたる広範な実験を通じて、PointINSは屋内のインスタンスセグメンテーションにおける平均+3.5% mAP改善と屋外のパン光学セグメンテーションにおける+4.1% PQゲインを達成し、スケーラブルな3Dファンデーションモデルへの道を開いた。

論文の概要: Towards Foundation Models for 3D Scene Understanding: Instance-Aware Self-Supervised Learning for Point Clouds

関連論文リスト