Octree Latent Diffusion for Semantic 3D Scene Generation and Completion
- URL: http://arxiv.org/abs/2509.16483v1
- Date: Sat, 20 Sep 2025 00:53:13 GMT
- Title: Octree Latent Diffusion for Semantic 3D Scene Generation and Completion
- Authors: Xujia Zhang, Brendan Crowe, Christoffer Heckman,
- Abstract summary: We develop a single framework that can perform scene completion, extension, and generation in both indoor and outdoor scenes.<n>Our approach operates directly on an efficient dual octree graph latent representation.<n>We demonstrate highquality structure, coherent semantics, and robust completion from single LiDAR scans.
- Score: 2.8992197334880268
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The completion, extension, and generation of 3D semantic scenes are an interrelated set of capabilities that are useful for robotic navigation and exploration. Existing approaches seek to decouple these problems and solve them oneoff. Additionally, these approaches are often domain-specific, requiring separate models for different data distributions, e.g. indoor vs. outdoor scenes. To unify these techniques and provide cross-domain compatibility, we develop a single framework that can perform scene completion, extension, and generation in both indoor and outdoor scenes, which we term Octree Latent Semantic Diffusion. Our approach operates directly on an efficient dual octree graph latent representation: a hierarchical, sparse, and memory-efficient occupancy structure. This technique disentangles synthesis into two stages: (i) structure diffusion, which predicts binary split signals to construct a coarse occupancy octree, and (ii) latent semantic diffusion, which generates semantic embeddings decoded by a graph VAE into voxellevel semantic labels. To perform semantic scene completion or extension, our model leverages inference-time latent inpainting, or outpainting respectively. These inference-time methods use partial LiDAR scans or maps to condition generation, without the need for retraining or finetuning. We demonstrate highquality structure, coherent semantics, and robust completion from single LiDAR scans, as well as zero-shot generalization to out-of-distribution LiDAR data. These results indicate that completion-through-generation in a dual octree graph latent space is a practical and scalable alternative to regression-based pipelines for real-world robotic perception tasks.
Related papers
- SemGS: Feed-Forward Semantic 3D Gaussian Splatting from Sparse Views for Generalizable Scene Understanding [18.889530477440793]
SemGS is a feed-forward framework for reconstructing generalizable semantic fields from image inputs.<n>We introduce a camera-aware attention mechanism into the feature extractor to explicitly model geometric relationships between camera viewpoints.<n>Experiments show that SemGS achieves state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2026-03-03T03:06:37Z) - StdGEN++: A Comprehensive System for Semantic-Decomposed 3D Character Generation [57.06461272772509]
StdGEN++ is a novel and comprehensive system for generating high-fidelity, semantically decomposed 3D characters from diverse inputs.<n>It achieves state-of-the-art performance, significantly outperforming existing methods in geometric accuracy and semantic disentanglement.<n>The resulting structural independence unlocks advanced downstream capabilities, including non-destructive editing, physics-compliant animation, and gaze tracking.
arXiv Detail & Related papers (2026-01-12T15:41:27Z) - OccLE: Label-Efficient 3D Semantic Occupancy Prediction [48.50138308129873]
3D semantic occupancy prediction offers an intuitive and efficient scene understanding.<n>Existing approaches either rely on full supervision, or on self-supervision, which provides limited guidance and yields suboptimal performance.<n>We propose OccLE, a Label-Efficient 3D Semantic Occupancy Prediction that takes images and LiDAR as inputs and maintains high performance with limited voxel annotations.
arXiv Detail & Related papers (2025-05-27T01:41:28Z) - Towards Generating Realistic 3D Semantic Training Data for Autonomous Driving [27.088907562842902]
In autonomous driving, 3D semantic segmentation plays an important role for enabling safe navigation.<n>The complexity of collecting and annotating 3D data is a bottleneck in this developments.<n>We propose a novel approach able to generate 3D semantic scene-scale data without relying on any projection or decoupled trained multi-resolution models.
arXiv Detail & Related papers (2025-03-27T12:41:42Z) - Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding [59.51535163599723]
FreeGS is an unsupervised semantic-embedded 3DGS framework that achieves view-consistent 3D scene understanding without the need for 2D labels.<n>FreeGS performs comparably to state-of-the-art methods while avoiding the complex data preprocessing workload.
arXiv Detail & Related papers (2024-11-29T08:52:32Z) - Open-Vocabulary Octree-Graph for 3D Scene Understanding [54.11828083068082]
Octree-Graph is a novel scene representation for open-vocabulary 3D scene understanding.
An adaptive-octree structure is developed that stores semantics and depicts the occupancy of an object adjustably according to its shape.
arXiv Detail & Related papers (2024-11-25T10:14:10Z) - Mixed Diffusion for 3D Indoor Scene Synthesis [55.94569112629208]
We present MiDiffusion, a novel mixed discrete-continuous diffusion model designed to synthesize plausible 3D indoor scenes.<n>We show it outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis.
arXiv Detail & Related papers (2024-05-31T17:54:52Z) - ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z) - LISNeRF Mapping: LiDAR-based Implicit Mapping via Semantic Neural Fields for Large-Scale 3D Scenes [2.822816116516042]
Large-scale semantic mapping is crucial for outdoor autonomous agents to fulfill high-level tasks such as planning and navigation.
This paper proposes a novel method for large-scale 3D semantic reconstruction through implicit representations from posed LiDAR measurements alone.
arXiv Detail & Related papers (2023-11-04T03:55:38Z) - SCFusion: Real-time Incremental Scene Reconstruction with Semantic
Completion [86.77318031029404]
We propose a framework that performs scene reconstruction and semantic scene completion jointly in an incremental and real-time manner.
Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model.
arXiv Detail & Related papers (2020-10-26T15:31:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.