Real-time 3D Semantic Scene Completion Via Feature Aggregation and
Conditioned Prediction
- URL: http://arxiv.org/abs/2303.10967v2
- Date: Sat, 25 Mar 2023 07:36:34 GMT
- Title: Real-time 3D Semantic Scene Completion Via Feature Aggregation and
Conditioned Prediction
- Authors: Xiaokang Chen, Yajie Xing and Gang Zeng
- Abstract summary: We propose a real-time semantic scene completion method with a feature aggregation strategy and conditioned prediction module.
Our method achieves competitive performance at a speed of 110 FPS on one GTX 1080 Ti GPU.
- Score: 17.54862035445157
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semantic Scene Completion (SSC) aims to simultaneously predict the volumetric
occupancy and semantic category of a 3D scene. In this paper, we propose a
real-time semantic scene completion method with a feature aggregation strategy
and conditioned prediction module. Feature aggregation fuses feature with
different receptive fields and gathers context to improve scene completion
performance. And the conditioned prediction module adopts a two-step prediction
scheme that takes volumetric occupancy as a condition to enhance semantic
completion prediction. We conduct experiments on three recognized benchmarks
NYU, NYUCAD, and SUNCG. Our method achieves competitive performance at a speed
of 110 FPS on one GTX 1080 Ti GPU.
Related papers
- ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks.
We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation.
Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera [53.20087549782785]
We introduce ET-Former, a novel end-to-end algorithm for semantic scene completion using a single monocular camera.
Our approach generates a semantic occupancy map from single RGB observation while simultaneously providing uncertainty estimates for semantic predictions.
arXiv Detail & Related papers (2024-10-14T19:14:49Z) - OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries.
OPUS incorporates a suite of non-trivial strategies to enhance model performance.
Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z) - U3DS$^3$: Unsupervised 3D Semantic Scene Segmentation [19.706172244951116]
This paper presents U3DS$3$, as a step towards completely unsupervised point cloud segmentation for any holistic 3D scenes.
The initial step of our proposed approach involves generating superpoints based on the geometric characteristics of each scene.
We then undergo a learning process through a spatial clustering-based methodology, followed by iterative training using pseudo-labels generated in accordance with the cluster centroids.
arXiv Detail & Related papers (2023-11-10T12:05:35Z) - Self-supervised Pre-training with Masked Shape Prediction for 3D Scene
Understanding [106.0876425365599]
Masked Shape Prediction (MSP) is a new framework to conduct masked signal modeling in 3D scenes.
MSP uses the essential 3D semantic cue, i.e., geometric shape, as the prediction target for masked points.
arXiv Detail & Related papers (2023-05-08T20:09:19Z) - Incremental 3D Semantic Scene Graph Prediction from RGB Sequences [86.77318031029404]
We propose a real-time framework that incrementally builds a consistent 3D semantic scene graph of a scene given an RGB image sequence.
Our method consists of a novel incremental entity estimation pipeline and a scene graph prediction network.
The proposed network estimates 3D semantic scene graphs with iterative message passing using multi-view and geometric features extracted from the scene entities.
arXiv Detail & Related papers (2023-05-04T11:32:16Z) - Joint Forecasting of Features and Feature Motion for Dense Semantic
Future Prediction [0.0]
The approach consists of two modules: Feature-to-motion (F2M) and Feature-to-feature (F2F)
The compound F2MF approach decouples effects of motion from the effects of novelty in a task-agnostic manner.
We perform experiments on three dense prediction tasks: semantic segmentation, instance-level segmentation, and panoptic segmentation.
arXiv Detail & Related papers (2021-01-26T13:30:44Z) - Semantic Scene Completion using Local Deep Implicit Functions on LiDAR
Data [4.355440821669468]
We propose a scene segmentation network based on local Deep Implicit Functions as a novel learning-based method for scene completion.
We show that this continuous representation is suitable to encode geometric and semantic properties of extensive outdoor scenes without the need for spatial discretization.
Our experiments verify that our method generates a powerful representation that can be decoded into a dense 3D description of a given scene.
arXiv Detail & Related papers (2020-11-18T07:39:13Z) - SCFusion: Real-time Incremental Scene Reconstruction with Semantic
Completion [86.77318031029404]
We propose a framework that performs scene reconstruction and semantic scene completion jointly in an incremental and real-time manner.
Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model.
arXiv Detail & Related papers (2020-10-26T15:31:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.