EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video
- URL: http://arxiv.org/abs/2409.01807v1
- Date: Tue, 3 Sep 2024 11:40:31 GMT
- Title: EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video
- Authors: Zhen Zhou, Yunkai Ma, Junfeng Fan, Shaolin Zhang, Fengshui Jing, Min Tan,
- Abstract summary: We present EPRecon, an efficient real-time panoptic 3D reconstruction framework.
We propose a lightweight module to directly estimate scene depth priors in a 3D volume.
In addition, to infer richer panoptic features from occupied voxels, EPRecon extracts panoptic features from both voxel features and corresponding image features.
- Score: 6.236130301507863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Panoptic 3D reconstruction from a monocular video is a fundamental perceptual task in robotic scene understanding. However, existing efforts suffer from inefficiency in terms of inference speed and accuracy, limiting their practical applicability. We present EPRecon, an efficient real-time panoptic 3D reconstruction framework. Current volumetric-based reconstruction methods usually utilize multi-view depth map fusion to obtain scene depth priors, which is time-consuming and poses challenges to real-time scene reconstruction. To end this, we propose a lightweight module to directly estimate scene depth priors in a 3D volume for reconstruction quality improvement by generating occupancy probabilities of all voxels. In addition, to infer richer panoptic features from occupied voxels, EPRecon extracts panoptic features from both voxel features and corresponding image features, obtaining more detailed and comprehensive instance-level semantic information and achieving more accurate segmentation results. Experimental results on the ScanNetV2 dataset demonstrate the superiority of EPRecon over current state-of-the-art methods in terms of both panoptic 3D reconstruction quality and real-time inference. Code is available at https://github.com/zhen6618/EPRecon.
Related papers
- sshELF: Single-Shot Hierarchical Extrapolation of Latent Features for 3D Reconstruction from Sparse-Views [41.73382885439258]
Reconstructing outdoor scenes from outward-facing views poses significant challenges due to minimal view overlap.
We propose a fast, single-shot pipeline for unbounded-view 3D scene reconstruction via hierarchal extrapolation.
We find that latentELF faithfully reconstructs occluded regions, supports real-time rendering, and provides rich features for downstream applications.
arXiv Detail & Related papers (2025-02-06T18:58:45Z) - Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving [116.10577967146762]
We propose Driv3R, a framework that directly regresses per-frame point maps from multi-view image sequences.
We employ a 4D flow predictor to identify moving objects within the scene to direct our network focus more on reconstructing these dynamic regions.
Driv3R outperforms previous frameworks in 4D dynamic scene reconstruction, achieving 15x faster inference speed.
arXiv Detail & Related papers (2024-12-09T18:58:03Z) - FineRecon: Depth-aware Feed-forward Network for Detailed 3D
Reconstruction [13.157400338544177]
Recent works on 3D reconstruction from posed images have demonstrated that direct inference of scene-level 3D geometry is feasible using deep neural networks.
We propose three effective solutions for improving the fidelity of inference-based 3D reconstructions.
Our method, FineRecon, produces smooth and highly accurate reconstructions, showing significant improvements across multiple depth and 3D reconstruction metrics.
arXiv Detail & Related papers (2023-04-04T02:50:29Z) - VolRecon: Volume Rendering of Signed Ray Distance Functions for
Generalizable Multi-View Reconstruction [64.09702079593372]
VolRecon is a novel generalizable implicit reconstruction method with Signed Ray Distance Function (SRDF)
On DTU dataset, VolRecon outperforms SparseNeuS by about 30% in sparse view reconstruction and achieves comparable accuracy as MVSNet in full view reconstruction.
arXiv Detail & Related papers (2022-12-15T18:59:54Z) - PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images [105.29493158036105]
PETRv2 is a unified framework for 3D perception from multi-view images.
We extend the 3D position embedding in PETR for temporal modeling.
PETRv2 achieves state-of-the-art performance on 3D object detection and BEV segmentation.
arXiv Detail & Related papers (2022-06-02T19:13:03Z) - Neural 3D Reconstruction in the Wild [86.6264706256377]
We introduce a new method that enables efficient and accurate surface reconstruction from Internet photo collections.
We present a new benchmark and protocol for evaluating reconstruction performance on such in-the-wild scenes.
arXiv Detail & Related papers (2022-05-25T17:59:53Z) - NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video [41.554961144321474]
We propose to reconstruct local surfaces represented as sparse TSDF volumes for each video fragment sequentially by a neural network.
A learning-based TSDF fusion module is used to guide the network to fuse features from previous fragments.
Experiments on ScanNet and 7-Scenes datasets show that our system outperforms state-of-the-art methods in terms of both accuracy and speed.
arXiv Detail & Related papers (2021-04-01T17:59:46Z) - MVSNeRF: Fast Generalizable Radiance Field Reconstruction from
Multi-View Stereo [52.329580781898116]
We present MVSNeRF, a novel neural rendering approach that can efficiently reconstruct neural radiance fields for view synthesis.
Unlike prior works on neural radiance fields that consider per-scene optimization on densely captured images, we propose a generic deep neural network that can reconstruct radiance fields from only three nearby input views via fast network inference.
arXiv Detail & Related papers (2021-03-29T13:15:23Z) - SCFusion: Real-time Incremental Scene Reconstruction with Semantic
Completion [86.77318031029404]
We propose a framework that performs scene reconstruction and semantic scene completion jointly in an incremental and real-time manner.
Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model.
arXiv Detail & Related papers (2020-10-26T15:31:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.