Fugu-MT 論文翻訳(概要): Octree Latent Diffusion for Semantic 3D Scene Generation and Completion

論文の概要: Octree Latent Diffusion for Semantic 3D Scene Generation and Completion

arxiv url: http://arxiv.org/abs/2509.16483v1
Date: Sat, 20 Sep 2025 00:53:13 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-23 18:58:15.815203
Title: Octree Latent Diffusion for Semantic 3D Scene Generation and Completion
Title（参考訳）: セマンティック3次元シーン生成と完成のためのOctree Latent Diffusion
Authors: Xujia Zhang, Brendan Crowe, Christoffer Heckman,
Abstract要約: 本研究では,屋内と屋外の両方でシーン補完,拡張,生成を行うことのできる単一のフレームワークを開発する。提案手法は,効率的な2重オクツリーグラフ潜在表現を直接操作する。単一LiDARスキャンによる高品質な構造,コヒーレントなセマンティクス,ロバストな補完を実証する。
参考スコア（独自算出の注目度）: 2.8992197334880268
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The completion, extension, and generation of 3D semantic scenes are an interrelated set of capabilities that are useful for robotic navigation and exploration. Existing approaches seek to decouple these problems and solve them oneoff. Additionally, these approaches are often domain-specific, requiring separate models for different data distributions, e.g. indoor vs. outdoor scenes. To unify these techniques and provide cross-domain compatibility, we develop a single framework that can perform scene completion, extension, and generation in both indoor and outdoor scenes, which we term Octree Latent Semantic Diffusion. Our approach operates directly on an efficient dual octree graph latent representation: a hierarchical, sparse, and memory-efficient occupancy structure. This technique disentangles synthesis into two stages: (i) structure diffusion, which predicts binary split signals to construct a coarse occupancy octree, and (ii) latent semantic diffusion, which generates semantic embeddings decoded by a graph VAE into voxellevel semantic labels. To perform semantic scene completion or extension, our model leverages inference-time latent inpainting, or outpainting respectively. These inference-time methods use partial LiDAR scans or maps to condition generation, without the need for retraining or finetuning. We demonstrate highquality structure, coherent semantics, and robust completion from single LiDAR scans, as well as zero-shot generalization to out-of-distribution LiDAR data. These results indicate that completion-through-generation in a dual octree graph latent space is a practical and scalable alternative to regression-based pipelines for real-world robotic perception tasks.
Abstract（参考訳）: 3Dセマンティックシーンの完成、拡張、生成は、ロボットナビゲーションと探索に有用な、関連した機能セットである。既存のアプローチは、これらの問題を分離して解決しようと試みている。さらに、これらのアプローチはドメイン固有であり、屋内と屋外のシーンなど、異なるデータ分散のための別々のモデルを必要とすることが多い。これらの技術を統一し、ドメイン間の互換性を提供するため、屋内および屋外の両方でシーン補完、拡張、生成が可能な単一のフレームワークを開発し、これをOctree Latent Semantic Diffusionと呼ぶ。提案手法は, 階層構造, スパース構造, メモリ効率の高い占有構造という, 効率的な2重オクツリーグラフ潜在表現を直接操作する。この技術は合成を2段階に分離する。一粗い占有オクツリーを構築するために二分分割信号を予測する構造拡散、及び (II) グラフVAEでデコードされたセマンティック埋め込みをボクセルレベルのセマンティックラベルに生成する潜在セマンティック拡散。セマンティック・シーン・コンプリート・エンプリート・エンプリート・エンプリート・エンプリート・エンプリート・インプリート・インプリート・インプリート・インプリート・インプリート・インプリート・インプリート・インプリート・インプリート・インプリート・インプリート・インこれらの推論時間法は、リトレーニングや微調整を必要とせず、部分的なLiDARスキャンや条件生成にマップを使用する。我々は、単一LiDARスキャンによる高品質な構造、コヒーレントなセマンティクス、ロバストな補完、および非ショットなLiDARデータへの一般化を実証する。これらの結果は、二重オクツリーグラフ潜在空間における完了スルー生成が、現実のロボット認識タスクのための回帰に基づくパイプラインの実用的でスケーラブルな代替であることを示している。

論文の概要: Octree Latent Diffusion for Semantic 3D Scene Generation and Completion

関連論文リスト