Fugu-MT 論文翻訳(概要): Resolving Long-Tail Ambiguity in Unsupervised 3D Point Cloud Segmentation with Language Priors

論文の概要: Resolving Long-Tail Ambiguity in Unsupervised 3D Point Cloud Segmentation with Language Priors

arxiv url: http://arxiv.org/abs/2605.20737v1
Date: Wed, 20 May 2026 05:42:29 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.500873
Title: Resolving Long-Tail Ambiguity in Unsupervised 3D Point Cloud Segmentation with Language Priors
Title（参考訳）: 教師なし3次元点群分割における言語先行詞による長期的あいまいさの解消
Authors: Siqi Wei, Hongbin Xu, Feng Xiao, Tian Lan, Chun Li, Ming Li, Qiuxia Wu,
Abstract要約: LangTailは言語誘導型階層型学習フレームワークである。教師なし3Dセグメンテーションにおける長い尾のあいまいさを緩和する。既存の手法をかなりの差で一貫して上回る。
参考スコア（独自算出の注目度）: 21.92314835722451
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing approaches for unsupervised 3D point cloud segmentation predominantly rely on a purely visual similarity-based learning-by-clustering paradigm, which suffers from a fundamental limitation: long-tail ambiguity. In such a paradigm, features of minor classes are consistently absorbed by dominant clusters, leading to severely imbalanced predictions. To address this issue, we propose LangTail, a language-guided hierarchical learning framework that leverages the balanced world knowledge encoded in language models to mitigate long-tail ambiguity in unsupervised 3D segmentation. The key idea is to establish multi-level associations between language-derived semantic priors and visually underrepresented minor classes, thereby compensating for the biased attention of purely visual clustering toward dominant classes. Specifically, LangTail first constructs an entity-level semantic prior from language models, capturing balanced and fine-grained world knowledge across categories. These priors are injected into a hierarchical clustering framework via contrastive alignment. This guides multi-granularity semantic structure formation and prevents minor classes from being absorbed by dominant clusters, yielding more discriminative representations for underrepresented categories. Extensive experiments on ScanNet-v2, S3DIS, and nuScenes demonstrate that LangTail consistently outperforms existing methods by significant margins, \ie, +13.5, +12.9, and +8.9 mIoU, respectively. These results demonstrate the effectiveness of language priors in improving the representation of minority classes in 3D point clouds. The code will be released at: https://github.com/Whisky0129/langtail_official.
Abstract（参考訳）: 教師なしの3Dポイントクラウドセグメンテーションに対する既存のアプローチは、主に、視覚的な類似性に基づく学習とクラスタリングのパラダイムに依存している。このようなパラダイムでは、マイナークラスの特徴は支配的なクラスタによって一貫して吸収され、非常に不均衡な予測をもたらす。この問題に対処するため,LangTailを提案する。LangTailは言語モデルに符号化されたバランスの取れた世界知識を利用して,教師なし3次元セグメンテーションにおける長期的曖昧さを軽減する言語誘導型階層学習フレームワークである。キーとなる考え方は、言語由来のセマンティック・先行と視覚的に表現されていないマイノリティ・クラスとのマルチレベルな関連を確立することであり、それによって、純粋に視覚的なクラスタリングが支配的なクラスに偏った注意を補償することである。具体的には、LangTailはまず言語モデルに先立ってエンティティレベルのセマンティクスを構築し、カテゴリ間でバランスのとれた、きめ細かい世界知識をキャプチャする。これらの先行は、対照的なアライメントを通じて階層的なクラスタリングフレームワークに注入される。このことは、多粒性意味構造の形成を導いており、マイナークラスが支配的なクラスタに吸収されることを防ぎ、未表現のカテゴリに対してより差別的な表現をもたらす。 ScanNet-v2 と S3DIS と nuScenes の広範な実験により、LangTail は、それぞれ +13.5 と +12.9 mIoU と +8.9 mIoU の差で、既存の手法を一貫して上回っていることが示されている。これらの結果は,3次元点群におけるマイノリティクラス表現の改善における言語先行性の有効性を示すものである。コードは、https://github.com/Whisky0129/langtail_officialでリリースされる。

論文の概要: Resolving Long-Tail Ambiguity in Unsupervised 3D Point Cloud Segmentation with Language Priors

関連論文リスト