Fugu-MT 論文翻訳(概要): Distill, Diffuse, and Semanticize (DDS): Annotation-Free 3D Scene Understanding Based on Multi-Granularity Distillation and Graph-Diffusion-Based Segmentation

論文の概要: Distill, Diffuse, and Semanticize (DDS): Annotation-Free 3D Scene Understanding Based on Multi-Granularity Distillation and Graph-Diffusion-Based Segmentation

arxiv url: http://arxiv.org/abs/2605.08293v2
Date: Wed, 13 May 2026 09:48:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-14 17:13:58.807235
Title: Distill, Diffuse, and Semanticize (DDS): Annotation-Free 3D Scene Understanding Based on Multi-Granularity Distillation and Graph-Diffusion-Based Segmentation
Title（参考訳）: Distill, Diffuse, and Semanticize (DDS):多粒度蒸留とグラフ拡散に基づくセグメンテーションに基づくアノテーションなし3次元シーン理解
Authors: Yijing Wang, Ruonan Li, Qilin Wang, Rongqiang Zhao, Jie Liu,
Abstract要約: 3Dセマンティックなシーン理解は、デジタル双生児、自律運転、スマート農業、そして体感にとって不可欠である。既存のアノテーションのないメソッドは、しばしば意味認識と構造的効率のトレードオフに直面します。本稿では,領域一貫性とセマンティックな3Dシーン理解のための資源効率の高い構造指向フレームワークを提案する。
参考スコア（独自算出の注目度）: 6.093743600103449
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D semantic scene understanding is essential for digital twins, autonomous driving, smart agriculture, and embodied perception, yet dense point-wise annotation for point clouds remains expensive and difficult to scale. Existing annotation-free methods often face a trade-off between semantic recognition and structural efficiency: open-vocabulary and foundation-model-driven methods provide strong semantic priors, but often come with substantial computational costs, while structure-oriented methods based on superpoints, clustering, and graph reasoning are lightweight but often produce category-agnostic regions. We propose DDS, a resource-efficient structure-oriented framework for region-consistent and semanticized annotation-free 3D scene understanding. DDS preserves the lightweight superpoint-based organization paradigm while incorporating visual semantic cues from projected features and segmentation-derived masks. It first performs multi-granularity distillation to guide the 3D backbone at the point, mask-prototype, and inter-prototype levels, then applies graph diffusion over superpoints to propagate semantic information directly in 3D, producing coherent region representations without costly spectral decomposition or dense open-vocabulary 3D feature fields. Finally, DDS uses segmentation-cluster association to assign interpretable semantic names to category-agnostic 3D clusters. Experiments on real-world datasets show that DDS achieves the best performance among representative structure-oriented annotation-free baselines, improving oAcc, mAcc, and mIoU by up to 5.9%, 8.1%, and 2.4%, respectively. These results demonstrate that DDS improves region consistency and lightweight semantic recognition, providing a scalable and interpretable solution for annotation-free 3D scene understanding.
Abstract（参考訳）: 3Dセマンティックなシーン理解は、デジタル双生児、自律運転、スマート農業、そして知覚の具体化に不可欠である。オープンボキャブラリと基礎モデル駆動の手法は強力なセマンティック事前を提供するが、しばしばかなりの計算コストがかかる一方、スーパーポイント、クラスタリング、グラフ推論に基づく構造指向の手法は軽量だがカテゴリに依存しない領域をしばしば生成する。 DDSは、領域一貫性と意味論的アノテーションのない3Dシーン理解のための、資源効率の良い構造指向フレームワークである。 DDSは軽量なスーパーポイントベースの組織パラダイムを維持しつつ、投影された特徴やセグメンテーション由来のマスクから視覚的セマンティックな手がかりを取り入れている。まず、その点における3Dバックボーン、マスク-プロトタイプ、および原型間レベルを導くために多粒度蒸留を行い、次いで、スーパーポイント上のグラフ拡散を適用して3Dで意味情報を直接伝播させ、コストのかかるスペクトル分解や密集したオープンボキャブラリー3D特徴体を伴わない一貫性のある領域表現を生成する。最後に、DDSはセグメンテーションクラスタアソシエーションを使用して、解釈可能なセマンティック名をカテゴリに依存しない3Dクラスタに割り当てる。実世界のデータセットでの実験では、DDSは代表的構造指向のアノテーションなしベースラインの中で最高のパフォーマンスを達成し、oAcc、mAcc、mIoUをそれぞれ5.9%、8.1%、そして2.4%改善している。これらの結果は、DDSが領域の一貫性と軽量なセマンティック認識を改善し、アノテーションのない3Dシーン理解のためのスケーラブルで解釈可能なソリューションを提供することを示す。

関連論文リスト

Segment Any 3D-Part in a Scene from a Sentence [50.46950922754459]
本稿では,自然言語記述に基づくシーン内の任意の3次元部分のセグメンテーションを実現することを目的とする。本稿では,高密度部分アノテーションを用いた最初の大規模3Dデータセットである3D-PUデータセットを紹介する。手法面では,パートレベルセグメンテーションの課題に対処する3DインプットのみのフレームワークであるOpenPart3Dを提案する。
論文参考訳（メタデータ） (2025-06-24T05:51:22Z)
LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds [5.636411923613415]
ローカルとグローバルの両方の機能から3Dセマンティクスを学ぶためにLogoSPを紹介します。我々のアプローチは,周波数領域におけるグローバルなパターンに従ってスーパーポイントをグループ化することで,3次元意味情報を発見することである。
論文参考訳（メタデータ） (2025-06-09T15:21:37Z)
BFANet: Revisiting 3D Semantic Segmentation with Boundary Feature Analysis [33.53327976669034]
より粒度の細かいレンズを通して3Dセマンティックセマンティックセグメンテーションを再考し、より広範なパフォーマンス指標によって隠蔽される微妙な複雑さに光を当てます。本稿では,BFANetと呼ばれる3次元セマンティックセマンティック・セマンティック・ネットワークを導入し,セマンティック・バウンダリの特徴を詳細に分析する。
論文参考訳（メタデータ） (2025-03-16T15:13:11Z)
Bayesian Self-Training for Semi-Supervised 3D Segmentation [59.544558398992386]
3Dセグメンテーションはコンピュータビジョンの中核的な問題である。完全に教師されたトレーニングを採用するために、3Dポイントクラウドを密にラベル付けすることは、労働集約的で高価です。半教師付きトレーニングは、ラベル付きデータの小さなセットのみを付与し、より大きなラベル付きデータセットを伴って、より実用的な代替手段を提供する。
論文参考訳（メタデータ） (2024-09-12T14:54:31Z)
Generalized Robot 3D Vision-Language Model with Fast Rendering and Pre-Training Vision-Language Alignment [55.11291053011696]
本研究は,ラベル付きシーンが極めて限定された場合の3次元シーン理解のためのフレームワークを提案する。事前学習された視覚言語モデルから新しいカテゴリーの知識を抽出するために,階層的特徴整合型事前学習と知識蒸留戦略を提案する。限定的な再構築の場合、提案手法はWS3D++と呼ばれ、大規模なScanNetベンチマークで1位にランクインした。
論文参考訳（メタデータ） (2023-12-01T15:47:04Z)
Lowis3D: Language-Driven Open-World Instance-Level 3D Scene Understanding [57.47315482494805]
オープンワールドのインスタンスレベルのシーン理解は、アノテーション付きデータセットに存在しない未知のオブジェクトカテゴリを特定し、認識することを目的としている。モデルは新しい3Dオブジェクトをローカライズし、それらのセマンティックなカテゴリを推論する必要があるため、この課題は難しい。本稿では,3Dシーンのキャプションを生成するために,画像テキストペアからの広範な知識を符号化する,事前学習型視覚言語基盤モデルを提案する。
論文参考訳（メタデータ） (2023-08-01T07:50:14Z)
Box2Seg: Learning Semantics of 3D Point Clouds with Box-Level Supervision [65.19589997822155]
我々は3Dポイントクラウドのポイントレベルのセマンティクスをバウンディングボックスレベルの監視で学習するために,Box2Segと呼ばれるニューラルアーキテクチャを導入する。提案するネットワークは,安価な,あるいは既定のバウンディングボックスレベルのアノテーションやサブクラウドレベルのタグでトレーニング可能であることを示す。
論文参考訳（メタデータ） (2022-01-09T09:07:48Z)
3D Segmentation Learning from Sparse Annotations and Hierarchical Descriptors [7.161067294394475]
GIDSegはスパースアノテーションからセグメンテーションを同時に学習できる新しいアプローチである。 GIDSegは、動的エッジ畳み込みネットワークを介して、グローバルおよび個別の関係を描いている。逆学習モジュールは、ID記述子の条件制約をさらに強化するためにも設計されている。
論文参考訳（メタデータ） (2021-05-27T00:31:37Z)
S3Net: 3D LiDAR Sparse Semantic Segmentation Network [1.330528227599978]
S3NetはLiDARポイントクラウドセマンティックセグメンテーションのための新しい畳み込みニューラルネットワークである。 sparse intra-channel attention module (sintraam)とsparse inter-channel attention module (sinteram)で構成されるエンコーダ-デコーダバックボーンを採用する。
論文参考訳（メタデータ） (2021-03-15T22:15:24Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。