Related papers: One Step Closer: Creating the Future to Boost Monocular Semantic Scene Completion

One Step Closer: Creating the Future to Boost Monocular Semantic Scene Completion

URL: http://arxiv.org/abs/2507.13801v1
Date: Fri, 18 Jul 2025 10:24:58 GMT
Title: One Step Closer: Creating the Future to Boost Monocular Semantic Scene Completion
Authors: Haoang Lu, Yuanqi Su, Xiaoning Zhang, Hao Hu,
Abstract summary: In real-world traffic scenarios, a significant portion of a visual 3D scene remains occluded or outside the camera's field of view.<n>We propose Creating the Future SSC, a novel temporal SSC framework that leverages pseudo-future frame prediction to expand the model's effective perceptual range.<n>Our approach combines poses and depths to establish accurate 3D correspondences, enabling geometrically-consistent fusion of past, present, and predicted future frames in 3D space.
Score: 3.664655957801223
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent years, visual 3D Semantic Scene Completion (SSC) has emerged as a critical perception task for autonomous driving due to its ability to infer complete 3D scene layouts and semantics from single 2D images. However, in real-world traffic scenarios, a significant portion of the scene remains occluded or outside the camera's field of view -- a fundamental challenge that existing monocular SSC methods fail to address adequately. To overcome these limitations, we propose Creating the Future SSC (CF-SSC), a novel temporal SSC framework that leverages pseudo-future frame prediction to expand the model's effective perceptual range. Our approach combines poses and depths to establish accurate 3D correspondences, enabling geometrically-consistent fusion of past, present, and predicted future frames in 3D space. Unlike conventional methods that rely on simple feature stacking, our 3D-aware architecture achieves more robust scene completion by explicitly modeling spatial-temporal relationships. Comprehensive experiments on SemanticKITTI and SSCBench-KITTI-360 benchmarks demonstrate state-of-the-art performance, validating the effectiveness of our approach, highlighting our method's ability to improve occlusion reasoning and 3D scene completion accuracy.

Related papers

StarPose: 3D Human Pose Estimation via Spatial-Temporal Autoregressive Diffusion [29.682018018059043]
StarPose is an autoregressive diffusion framework for 3D human pose estimation.<n>It incorporates historical 3D pose predictions and spatial-temporal physical guidance.<n>It achieves superior accuracy and temporal consistency in 3D human pose estimation.
arXiv Detail & Related papers (2025-08-04T04:50:05Z)
Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion [86.34232220368855]
Given a single input image, SceneDINO infers the 3D geometry and expressive 3D DINO features in a feed-forward manner.<n>In both 3D and 2D unsupervised scene understanding, SceneDINO reaches state-of-the-art segmentation accuracy.
arXiv Detail & Related papers (2025-07-08T17:59:50Z)
ACT-R: Adaptive Camera Trajectories for Single View 3D Reconstruction [12.942796503696194]
We introduce the simple idea of adaptive view planning to multi-view synthesis.<n>We generate a sequence of views, leveraging temporal consistency to enhance 3D coherence.<n>Our method improves 3D reconstruction over SOTA alternatives on the unseen GSO dataset.
arXiv Detail & Related papers (2025-05-13T05:31:59Z)
Learning Temporal 3D Semantic Scene Completion via Optical Flow Guidance [37.61183525419993]
3D Semantic Scene Completion (SSC) provides comprehensive scene geometry and semantics for autonomous driving perception.<n>Existing SSC methods are limited to capturing sparse information from the current frame or naively stacking multi-frame temporal features.<n>We propose a novel temporal SSC method FlowScene: Learning Temporal 3D Semantic Scene Completion via Optical Flow Guidance.
arXiv Detail & Related papers (2025-02-20T12:52:36Z)
OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation [84.32038395034868]
OccScene integrates fine-grained 3D perception and high-quality generation in a unified framework.<n>OccScene generates new and consistent 3D realistic scenes only depending on text prompts.<n>Experiments show that OccScene achieves realistic 3D scene generation in broad indoor and outdoor scenarios.
arXiv Detail & Related papers (2024-12-15T13:26:51Z)
Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion [57.232688209606515]
We present HTCL, a novel Temporal Temporal Context Learning paradigm for improving camera-based semantic scene completion. Our method ranks $1st$ on the Semantic KITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU.
arXiv Detail & Related papers (2024-07-02T09:11:17Z)
UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation. It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z)
A Spatiotemporal Approach to Tri-Perspective Representation for 3D Semantic Occupancy Prediction [6.527178779672975]
Vision-based 3D semantic occupancy prediction is increasingly overlooked in favor of LiDAR-based approaches.<n>This study introduces S2TPVFormer, a transformer architecture designed to predict temporally coherent 3D semantic occupancy.
arXiv Detail & Related papers (2024-01-24T20:06:59Z)
Camera-based 3D Semantic Scene Completion with Sparse Guidance Network [18.415854443539786]
We propose a camera-based semantic scene completion framework called SGN. SGN propagates semantics from semantic-aware seed voxels to the whole scene based on spatial geometry cues. Our experimental results demonstrate the superiority of our SGN over existing state-of-the-art methods.
arXiv Detail & Related papers (2023-12-10T04:17:27Z)
DepthSSC: Monocular 3D Semantic Scene Completion via Depth-Spatial Alignment and Voxel Adaptation [2.949710700293865]
We propose DepthSSC, an advanced method for semantic scene completion using only monocular cameras.<n> DepthSSC integrates the Spatial Transformation Graph Fusion (ST-GF) module with Geometric-Aware Voxelization (GAV)<n>We show that DepthSSC captures intricate 3D structural details effectively and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-11-28T01:47:51Z)
EvAC3D: From Event-based Apparent Contours to 3D Models via Continuous Visual Hulls [46.94040300725127]
3D reconstruction from multiple views is a successful computer vision field with multiple deployments in applications. We study the problem of 3D reconstruction from event-cameras, motivated by the advantages of event-based cameras in terms of low power and latency. We propose Apparent Contour Events (ACE), a novel event-based representation that defines the geometry of the apparent contour of an object.
arXiv Detail & Related papers (2023-04-11T15:46:16Z)
SCFusion: Real-time Incremental Scene Reconstruction with Semantic Completion [86.77318031029404]
We propose a framework that performs scene reconstruction and semantic scene completion jointly in an incremental and real-time manner. Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model.
arXiv Detail & Related papers (2020-10-26T15:31:52Z)
3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior [50.73148041205675]
The goal of the Semantic Scene Completion (SSC) task is to simultaneously predict a completed 3D voxel representation of volumetric occupancy and semantic labels of objects in the scene from a single-view observation. We propose to devise a new geometry-based strategy to embed depth information with low-resolution voxel representation. Our proposed geometric embedding works better than the depth feature learning from habitual SSC frameworks.
arXiv Detail & Related papers (2020-03-31T09:33:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.