One Step Closer: Creating the Future to Boost Monocular Semantic Scene Completion
- URL: http://arxiv.org/abs/2507.13801v1
- Date: Fri, 18 Jul 2025 10:24:58 GMT
- Title: One Step Closer: Creating the Future to Boost Monocular Semantic Scene Completion
- Authors: Haoang Lu, Yuanqi Su, Xiaoning Zhang, Hao Hu,
- Abstract summary: In real-world traffic scenarios, a significant portion of a visual 3D scene remains occluded or outside the camera's field of view.<n>We propose Creating the Future SSC, a novel temporal SSC framework that leverages pseudo-future frame prediction to expand the model's effective perceptual range.<n>Our approach combines poses and depths to establish accurate 3D correspondences, enabling geometrically-consistent fusion of past, present, and predicted future frames in 3D space.
- Score: 3.664655957801223
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, visual 3D Semantic Scene Completion (SSC) has emerged as a critical perception task for autonomous driving due to its ability to infer complete 3D scene layouts and semantics from single 2D images. However, in real-world traffic scenarios, a significant portion of the scene remains occluded or outside the camera's field of view -- a fundamental challenge that existing monocular SSC methods fail to address adequately. To overcome these limitations, we propose Creating the Future SSC (CF-SSC), a novel temporal SSC framework that leverages pseudo-future frame prediction to expand the model's effective perceptual range. Our approach combines poses and depths to establish accurate 3D correspondences, enabling geometrically-consistent fusion of past, present, and predicted future frames in 3D space. Unlike conventional methods that rely on simple feature stacking, our 3D-aware architecture achieves more robust scene completion by explicitly modeling spatial-temporal relationships. Comprehensive experiments on SemanticKITTI and SSCBench-KITTI-360 benchmarks demonstrate state-of-the-art performance, validating the effectiveness of our approach, highlighting our method's ability to improve occlusion reasoning and 3D scene completion accuracy.
Related papers
- StarPose: 3D Human Pose Estimation via Spatial-Temporal Autoregressive Diffusion [29.682018018059043]
StarPose is an autoregressive diffusion framework for 3D human pose estimation.<n>It incorporates historical 3D pose predictions and spatial-temporal physical guidance.<n>It achieves superior accuracy and temporal consistency in 3D human pose estimation.
arXiv Detail & Related papers (2025-08-04T04:50:05Z) - Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion [86.34232220368855]
Given a single input image, SceneDINO infers the 3D geometry and expressive 3D DINO features in a feed-forward manner.<n>In both 3D and 2D unsupervised scene understanding, SceneDINO reaches state-of-the-art segmentation accuracy.
arXiv Detail & Related papers (2025-07-08T17:59:50Z) - ACT-R: Adaptive Camera Trajectories for Single View 3D Reconstruction [12.942796503696194]
We introduce the simple idea of adaptive view planning to multi-view synthesis.<n>We generate a sequence of views, leveraging temporal consistency to enhance 3D coherence.<n>Our method improves 3D reconstruction over SOTA alternatives on the unseen GSO dataset.
arXiv Detail & Related papers (2025-05-13T05:31:59Z) - Learning Temporal 3D Semantic Scene Completion via Optical Flow Guidance [37.61183525419993]
3D Semantic Scene Completion (SSC) provides comprehensive scene geometry and semantics for autonomous driving perception.<n>Existing SSC methods are limited to capturing sparse information from the current frame or naively stacking multi-frame temporal features.<n>We propose a novel temporal SSC method FlowScene: Learning Temporal 3D Semantic Scene Completion via Optical Flow Guidance.
arXiv Detail & Related papers (2025-02-20T12:52:36Z) - OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation [84.32038395034868]
OccScene integrates fine-grained 3D perception and high-quality generation in a unified framework.<n>OccScene generates new and consistent 3D realistic scenes only depending on text prompts.<n>Experiments show that OccScene achieves realistic 3D scene generation in broad indoor and outdoor scenarios.
arXiv Detail & Related papers (2024-12-15T13:26:51Z) - Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion [57.232688209606515]
We present HTCL, a novel Temporal Temporal Context Learning paradigm for improving camera-based semantic scene completion.
Our method ranks $1st$ on the Semantic KITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU.
arXiv Detail & Related papers (2024-07-02T09:11:17Z) - UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - A Spatiotemporal Approach to Tri-Perspective Representation for 3D Semantic Occupancy Prediction [6.527178779672975]
Vision-based 3D semantic occupancy prediction is increasingly overlooked in favor of LiDAR-based approaches.<n>This study introduces S2TPVFormer, a transformer architecture designed to predict temporally coherent 3D semantic occupancy.
arXiv Detail & Related papers (2024-01-24T20:06:59Z) - Camera-based 3D Semantic Scene Completion with Sparse Guidance Network [18.415854443539786]
We propose a camera-based semantic scene completion framework called SGN.
SGN propagates semantics from semantic-aware seed voxels to the whole scene based on spatial geometry cues.
Our experimental results demonstrate the superiority of our SGN over existing state-of-the-art methods.
arXiv Detail & Related papers (2023-12-10T04:17:27Z) - DepthSSC: Monocular 3D Semantic Scene Completion via Depth-Spatial Alignment and Voxel Adaptation [2.949710700293865]
We propose DepthSSC, an advanced method for semantic scene completion using only monocular cameras.<n> DepthSSC integrates the Spatial Transformation Graph Fusion (ST-GF) module with Geometric-Aware Voxelization (GAV)<n>We show that DepthSSC captures intricate 3D structural details effectively and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-11-28T01:47:51Z) - EvAC3D: From Event-based Apparent Contours to 3D Models via Continuous
Visual Hulls [46.94040300725127]
3D reconstruction from multiple views is a successful computer vision field with multiple deployments in applications.
We study the problem of 3D reconstruction from event-cameras, motivated by the advantages of event-based cameras in terms of low power and latency.
We propose Apparent Contour Events (ACE), a novel event-based representation that defines the geometry of the apparent contour of an object.
arXiv Detail & Related papers (2023-04-11T15:46:16Z) - SCFusion: Real-time Incremental Scene Reconstruction with Semantic
Completion [86.77318031029404]
We propose a framework that performs scene reconstruction and semantic scene completion jointly in an incremental and real-time manner.
Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model.
arXiv Detail & Related papers (2020-10-26T15:31:52Z) - 3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure
Prior [50.73148041205675]
The goal of the Semantic Scene Completion (SSC) task is to simultaneously predict a completed 3D voxel representation of volumetric occupancy and semantic labels of objects in the scene from a single-view observation.
We propose to devise a new geometry-based strategy to embed depth information with low-resolution voxel representation.
Our proposed geometric embedding works better than the depth feature learning from habitual SSC frameworks.
arXiv Detail & Related papers (2020-03-31T09:33:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.