Related papers: PE3R: Perception-Efficient 3D Reconstruction

PE3R: Perception-Efficient 3D Reconstruction

URL: http://arxiv.org/abs/2503.07507v1
Date: Mon, 10 Mar 2025 16:29:10 GMT
Title: PE3R: Perception-Efficient 3D Reconstruction
Authors: Jie Hu, Shizun Wang, Xinchao Wang,
Abstract summary: Perception-Efficient 3D Reconstruction (PE3R) is a novel framework designed to enhance both accuracy and efficiency.<n>The framework achieves a minimum 9-fold speedup in 3D semantic field reconstruction, along with substantial gains in perception accuracy and reconstruction precision.
Score: 54.730257992806116
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in 2D-to-3D perception have significantly improved the understanding of 3D scenes from 2D images. However, existing methods face critical challenges, including limited generalization across scenes, suboptimal perception accuracy, and slow reconstruction speeds. To address these limitations, we propose Perception-Efficient 3D Reconstruction (PE3R), a novel framework designed to enhance both accuracy and efficiency. PE3R employs a feed-forward architecture to enable rapid 3D semantic field reconstruction. The framework demonstrates robust zero-shot generalization across diverse scenes and objects while significantly improving reconstruction speed. Extensive experiments on 2D-to-3D open-vocabulary segmentation and 3D reconstruction validate the effectiveness and versatility of PE3R. The framework achieves a minimum 9-fold speedup in 3D semantic field reconstruction, along with substantial gains in perception accuracy and reconstruction precision, setting new benchmarks in the field. The code is publicly available at: https://github.com/hujiecpp/PE3R.

Related papers

SR3D: Unleashing Single-view 3D Reconstruction for Transparent and Specular Object Grasping [7.222966501323922]
We propose a training free framework SR3D that enables robotic grasping of transparent and specular objects from a single view observation.<n>Specifically, given single view RGB and depth images, SR3D first uses the external visual models to generate 3D reconstructed object mesh.<n>Then, the key idea is to determine the 3D object's pose and scale to accurately localize the reconstructed object back into its original depth corrupted 3D scene.
arXiv Detail & Related papers (2025-05-30T07:38:46Z)
OB3D: A New Dataset for Benchmarking Omnidirectional 3D Reconstruction Using Blender [9.234032241605892]
We introduce Omnidirectional Blender 3D (OB3D), a new synthetic dataset for advancing 3D reconstruction from multiple omnidirectional images.<n>OB3D features diverse and complex 3D scenes generated from Blender 3D projects, with a deliberate emphasis on challenging scenarios.
arXiv Detail & Related papers (2025-05-26T15:25:29Z)
Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness [73.72335146374543]
We introduce reconstructive visual instruction tuning with 3D-awareness (Ross3D), which integrates 3D-aware visual supervision into the training procedure. Ross3D achieves state-of-the-art performance across various 3D scene understanding benchmarks.
arXiv Detail & Related papers (2025-04-02T16:59:55Z)
Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models [65.90387371072413]
We introduce Difix3D+, a novel pipeline designed to enhance 3D reconstruction and novel-view synthesis.<n>At the core of our approach is Difix, a single-step image diffusion model trained to enhance and remove artifacts in rendered novel views.
arXiv Detail & Related papers (2025-03-03T17:58:33Z)
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass [68.78222900840132]
We propose Fast 3D Reconstruction (Fast3R), a novel multi-view generalization to DUSt3R that achieves efficient and scalable 3D reconstruction by processing many views in parallel.<n>Fast3R demonstrates state-of-the-art performance, with significant improvements in inference speed and reduced error accumulation.
arXiv Detail & Related papers (2025-01-23T18:59:55Z)
DuoLift-GAN:Reconstructing CT from Single-view and Biplanar X-Rays with Generative Adversarial Networks [1.3812010983144802]
We introduce DuoLift Generative Adversarial Networks (DuoLift-GAN), a novel architecture with dual branches that independently elevate 2D images and their features into 3D representations.<n>These 3D outputs are merged into a unified 3D feature map and decoded into a complete 3D chest volume, enabling richer 3D information capture.
arXiv Detail & Related papers (2024-11-12T17:11:18Z)
G3R: Gradient Guided Generalizable Reconstruction [39.198327570559684]
We introduce G3R, a generalizable reconstruction approach that can efficiently predict high-quality 3D scene representations for large scenes. Experiments on urban-driving and drone datasets show that G3R generalizes across diverse large scenes and accelerates the reconstruction process by at least 10x.
arXiv Detail & Related papers (2024-09-28T16:54:16Z)
EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video [6.236130301507863]
We present EPRecon, an efficient real-time panoptic 3D reconstruction framework. We propose a lightweight module to directly estimate scene depth priors in a 3D volume. In addition, to infer richer panoptic features from occupied voxels, EPRecon extracts panoptic features from both voxel features and corresponding image features.
arXiv Detail & Related papers (2024-09-03T11:40:31Z)
Event3DGS: Event-Based 3D Gaussian Splatting for High-Speed Robot Egomotion [54.197343533492486]
Event3DGS can reconstruct high-fidelity 3D structure and appearance under high-speed egomotion. Experiments on multiple synthetic and real-world datasets demonstrate the superiority of Event3DGS compared with existing event-based dense 3D scene reconstruction frameworks. Our framework also allows one to incorporate a few motion-blurred frame-based measurements into the reconstruction process to further improve appearance fidelity without loss of structural accuracy.
arXiv Detail & Related papers (2024-06-05T06:06:03Z)
SCFusion: Real-time Incremental Scene Reconstruction with Semantic Completion [86.77318031029404]
We propose a framework that performs scene reconstruction and semantic scene completion jointly in an incremental and real-time manner. Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model.
arXiv Detail & Related papers (2020-10-26T15:31:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.