PROFusion: Robust and Accurate Dense Reconstruction via Camera Pose Regression and Optimization
- URL: http://arxiv.org/abs/2509.24236v1
- Date: Mon, 29 Sep 2025 03:20:49 GMT
- Title: PROFusion: Robust and Accurate Dense Reconstruction via Camera Pose Regression and Optimization
- Authors: Siyan Dong, Zijun Wang, Lulu Cai, Yi Ma, Yanchao Yang,
- Abstract summary: Real-time dense scene reconstruction is crucial for robotics.<n>Current RGB-D SLAM systems fail when cameras experience large viewpoint changes, fast motions, or sudden shaking.
- Score: 21.23419310544054
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-time dense scene reconstruction during unstable camera motions is crucial for robotics, yet current RGB-D SLAM systems fail when cameras experience large viewpoint changes, fast motions, or sudden shaking. Classical optimization-based methods deliver high accuracy but fail with poor initialization during large motions, while learning-based approaches provide robustness but lack sufficient accuracy for dense reconstruction. We address this challenge through a combination of learning-based initialization with optimization-based refinement. Our method employs a camera pose regression network to predict metric-aware relative poses from consecutive RGB-D frames, which serve as reliable starting points for a randomized optimization algorithm that further aligns depth images with the scene geometry. Extensive experiments demonstrate promising results: our approach outperforms the best competitor on challenging benchmarks, while maintaining comparable accuracy on stable motion sequences. The system operates in real-time, showcasing that combining simple and principled techniques can achieve both robustness for unstable motions and accuracy for dense reconstruction. Project page: https://github.com/siyandong/PROFusion.
Related papers
- GeoMotion: Rethinking Motion Segmentation via Latent 4D Geometry [61.24189040578178]
We propose a fully learning-based approach that directly infers moving objects from latent feature representations via attention mechanisms.<n>Our key insight is to bypass explicit correspondence estimation and instead let the model learn to implicitly disentangle object and camera motion.<n>Our approach achieves state-of-the-art motion segmentation performance with high efficiency.
arXiv Detail & Related papers (2026-02-25T11:36:33Z) - JOintGS: Joint Optimization of Cameras, Bodies and 3D Gaussians for In-the-Wild Monocular Reconstruction [18.636227266388218]
We present JOintGS, a unified framework that jointly optimize camera extrinsics, human poses, and 3D Gaussian representations.<n>Experiments on NeuMan and EMDB datasets demonstrate that JOintGS achieves superior reconstruction quality.
arXiv Detail & Related papers (2026-02-04T08:33:51Z) - JOGS: Joint Optimization of Pose Estimation and 3D Gaussian Splatting [10.35563602148445]
We propose a unified framework that jointly optimize 3D Gaussian points and camera poses without requiring pre-calibrated inputs.<n>Our approach iteratively refines 3D Gaussian parameters and updates camera poses through a novel co-optimization strategy.<n>Our approach significantly outperforms existing COLMAP-free techniques in reconstruction quality, and also surpasses the standard COLMAP-based baseline in general.
arXiv Detail & Related papers (2025-10-30T04:00:07Z) - UPGS: Unified Pose-aware Gaussian Splatting for Dynamic Scene Deblurring [31.35713139629235]
Reconstructing 3D scenes from monocular video often fails due to severe motion blur caused by camera and object motion.<n>We introduce a unified optimization framework by incorporating camera poses as learnable parameters.<n>Our method achieves significant gains in reconstruction quality and pose estimation accuracy over prior dynamic deblurring methods.
arXiv Detail & Related papers (2025-08-31T13:01:03Z) - GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering [54.489285024494855]
Video stabilization is pivotal for video processing, as it removes unwanted shakiness while preserving the original user motion intent.<n>Existing approaches, depending on the domain they operate, suffer from several issues that degrade the user experience.<n>We introduce textbfGaVS, a novel 3D-grounded approach that reformulates video stabilization as a temporally-consistent local reconstruction and rendering' paradigm.
arXiv Detail & Related papers (2025-06-30T15:24:27Z) - Towards Initialization-free Calibrated Bundle Adjustment [8.698137120086065]
We present a method that is able to use the known camera calibration thereby producing near metric solutions.<n>Our method can be seen as integrating rotation averaging into the pOSE framework.
arXiv Detail & Related papers (2025-06-30T12:55:44Z) - 3R-GS: Best Practice in Optimizing Camera Poses Along with 3DGS [36.48425755917156]
3D Gaussian Splatting (3DGS) has revolutionized neural rendering with its efficiency and quality.<n>It heavily depends on accurate camera poses from Structure-from-Motion (SfM) systems.<n>We present 3R-GS, a 3D Gaussian Splatting framework that bridges this gap.
arXiv Detail & Related papers (2025-04-05T22:31:08Z) - XR-VIO: High-precision Visual Inertial Odometry with Fast Initialization for XR Applications [34.2082611110639]
This paper presents a novel approach to Visual Inertial Odometry (VIO) focusing on the initialization and feature matching modules.<n>Existing methods for gyroscopes often suffer from poor stability in visual Structure from Motion (SfM) or in solving a huge number of parameters simultaneously.<n>By tightly coupling measurements, we enhance the robustness and accuracy of visual SfM.<n>In terms of feature matching, we introduce a hybrid method that combines optical flow and descriptor-based matching.
arXiv Detail & Related papers (2025-02-03T12:17:51Z) - InstantSplat: Sparse-view Gaussian Splatting in Seconds [91.77050739918037]
We introduce InstantSplat, a novel approach for addressing sparse-view 3D scene reconstruction at lightning-fast speed.<n>InstantSplat employs a self-supervised framework that optimize 3D scene representation and camera poses.<n>It achieves an acceleration of over 30x in reconstruction and improves visual quality (SSIM) from 0.3755 to 0.7624 compared to traditional SfM with 3D-GS.
arXiv Detail & Related papers (2024-03-29T17:29:58Z) - VICAN: Very Efficient Calibration Algorithm for Large Camera Networks [49.17165360280794]
We introduce a novel methodology that extends Pose Graph Optimization techniques.
We consider the bipartite graph encompassing cameras, object poses evolving dynamically, and camera-object relative transformations at each time step.
Our framework retains compatibility with traditional PGO solvers, but its efficacy benefits from a custom-tailored optimization scheme.
arXiv Detail & Related papers (2024-03-25T17:47:03Z) - ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving
Cameras in the Wild [57.37891682117178]
We present a robust dense indirect structure-from-motion method for videos that is based on dense correspondence from pairwise optical flow.
A novel neural network architecture is proposed for processing irregular point trajectory data.
Experiments on MPI Sintel dataset show that our system produces significantly more accurate camera trajectories.
arXiv Detail & Related papers (2022-07-19T09:19:45Z) - Asynchronous Optimisation for Event-based Visual Odometry [53.59879499700895]
Event cameras open up new possibilities for robotic perception due to their low latency and high dynamic range.
We focus on event-based visual odometry (VO)
We propose an asynchronous structure-from-motion optimisation back-end.
arXiv Detail & Related papers (2022-03-02T11:28:47Z) - Learning-Based Framework for Camera Calibration with Distortion
Correction and High Precision Feature Detection [14.297068346634351]
We propose a hybrid camera calibration framework which combines learning-based approaches with traditional methods to handle these bottlenecks.
In particular, this framework leverages learning-based approaches to perform efficient distortion correction and robust chessboard corner coordinate encoding.
Compared with two widely-used camera calibration toolboxes, experiment results on both real and synthetic datasets manifest the better robustness and higher precision of the proposed framework.
arXiv Detail & Related papers (2022-02-01T00:19:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.