Related papers: PROFusion: Robust and Accurate Dense Reconstruction via Camera Pose Regression and Optimization

PROFusion: Robust and Accurate Dense Reconstruction via Camera Pose Regression and Optimization

URL: http://arxiv.org/abs/2509.24236v1
Date: Mon, 29 Sep 2025 03:20:49 GMT
Title: PROFusion: Robust and Accurate Dense Reconstruction via Camera Pose Regression and Optimization
Authors: Siyan Dong, Zijun Wang, Lulu Cai, Yi Ma, Yanchao Yang,
Abstract summary: Real-time dense scene reconstruction is crucial for robotics.<n>Current RGB-D SLAM systems fail when cameras experience large viewpoint changes, fast motions, or sudden shaking.
Score: 21.23419310544054
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Real-time dense scene reconstruction during unstable camera motions is crucial for robotics, yet current RGB-D SLAM systems fail when cameras experience large viewpoint changes, fast motions, or sudden shaking. Classical optimization-based methods deliver high accuracy but fail with poor initialization during large motions, while learning-based approaches provide robustness but lack sufficient accuracy for dense reconstruction. We address this challenge through a combination of learning-based initialization with optimization-based refinement. Our method employs a camera pose regression network to predict metric-aware relative poses from consecutive RGB-D frames, which serve as reliable starting points for a randomized optimization algorithm that further aligns depth images with the scene geometry. Extensive experiments demonstrate promising results: our approach outperforms the best competitor on challenging benchmarks, while maintaining comparable accuracy on stable motion sequences. The system operates in real-time, showcasing that combining simple and principled techniques can achieve both robustness for unstable motions and accuracy for dense reconstruction. Project page: https://github.com/siyandong/PROFusion.

Related papers

GeoMotion: Rethinking Motion Segmentation via Latent 4D Geometry [61.24189040578178]
We propose a fully learning-based approach that directly infers moving objects from latent feature representations via attention mechanisms.<n>Our key insight is to bypass explicit correspondence estimation and instead let the model learn to implicitly disentangle object and camera motion.<n>Our approach achieves state-of-the-art motion segmentation performance with high efficiency.
arXiv Detail & Related papers (2026-02-25T11:36:33Z)
JOintGS: Joint Optimization of Cameras, Bodies and 3D Gaussians for In-the-Wild Monocular Reconstruction [18.636227266388218]
We present JOintGS, a unified framework that jointly optimize camera extrinsics, human poses, and 3D Gaussian representations.<n>Experiments on NeuMan and EMDB datasets demonstrate that JOintGS achieves superior reconstruction quality.
arXiv Detail & Related papers (2026-02-04T08:33:51Z)
JOGS: Joint Optimization of Pose Estimation and 3D Gaussian Splatting [10.35563602148445]
We propose a unified framework that jointly optimize 3D Gaussian points and camera poses without requiring pre-calibrated inputs.<n>Our approach iteratively refines 3D Gaussian parameters and updates camera poses through a novel co-optimization strategy.<n>Our approach significantly outperforms existing COLMAP-free techniques in reconstruction quality, and also surpasses the standard COLMAP-based baseline in general.
arXiv Detail & Related papers (2025-10-30T04:00:07Z)
UPGS: Unified Pose-aware Gaussian Splatting for Dynamic Scene Deblurring [31.35713139629235]
Reconstructing 3D scenes from monocular video often fails due to severe motion blur caused by camera and object motion.<n>We introduce a unified optimization framework by incorporating camera poses as learnable parameters.<n>Our method achieves significant gains in reconstruction quality and pose estimation accuracy over prior dynamic deblurring methods.
arXiv Detail & Related papers (2025-08-31T13:01:03Z)
GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering [54.489285024494855]
Video stabilization is pivotal for video processing, as it removes unwanted shakiness while preserving the original user motion intent.<n>Existing approaches, depending on the domain they operate, suffer from several issues that degrade the user experience.<n>We introduce textbfGaVS, a novel 3D-grounded approach that reformulates video stabilization as a temporally-consistent local reconstruction and rendering' paradigm.
arXiv Detail & Related papers (2025-06-30T15:24:27Z)
Towards Initialization-free Calibrated Bundle Adjustment [8.698137120086065]
We present a method that is able to use the known camera calibration thereby producing near metric solutions.<n>Our method can be seen as integrating rotation averaging into the pOSE framework.
arXiv Detail & Related papers (2025-06-30T12:55:44Z)
3R-GS: Best Practice in Optimizing Camera Poses Along with 3DGS [36.48425755917156]
3D Gaussian Splatting (3DGS) has revolutionized neural rendering with its efficiency and quality.<n>It heavily depends on accurate camera poses from Structure-from-Motion (SfM) systems.<n>We present 3R-GS, a 3D Gaussian Splatting framework that bridges this gap.
arXiv Detail & Related papers (2025-04-05T22:31:08Z)
XR-VIO: High-precision Visual Inertial Odometry with Fast Initialization for XR Applications [34.2082611110639]
This paper presents a novel approach to Visual Inertial Odometry (VIO) focusing on the initialization and feature matching modules.<n>Existing methods for gyroscopes often suffer from poor stability in visual Structure from Motion (SfM) or in solving a huge number of parameters simultaneously.<n>By tightly coupling measurements, we enhance the robustness and accuracy of visual SfM.<n>In terms of feature matching, we introduce a hybrid method that combines optical flow and descriptor-based matching.
arXiv Detail & Related papers (2025-02-03T12:17:51Z)
InstantSplat: Sparse-view Gaussian Splatting in Seconds [91.77050739918037]
We introduce InstantSplat, a novel approach for addressing sparse-view 3D scene reconstruction at lightning-fast speed.<n>InstantSplat employs a self-supervised framework that optimize 3D scene representation and camera poses.<n>It achieves an acceleration of over 30x in reconstruction and improves visual quality (SSIM) from 0.3755 to 0.7624 compared to traditional SfM with 3D-GS.
arXiv Detail & Related papers (2024-03-29T17:29:58Z)
VICAN: Very Efficient Calibration Algorithm for Large Camera Networks [49.17165360280794]
We introduce a novel methodology that extends Pose Graph Optimization techniques. We consider the bipartite graph encompassing cameras, object poses evolving dynamically, and camera-object relative transformations at each time step. Our framework retains compatibility with traditional PGO solvers, but its efficacy benefits from a custom-tailored optimization scheme.
arXiv Detail & Related papers (2024-03-25T17:47:03Z)
ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving Cameras in the Wild [57.37891682117178]
We present a robust dense indirect structure-from-motion method for videos that is based on dense correspondence from pairwise optical flow. A novel neural network architecture is proposed for processing irregular point trajectory data. Experiments on MPI Sintel dataset show that our system produces significantly more accurate camera trajectories.
arXiv Detail & Related papers (2022-07-19T09:19:45Z)
Asynchronous Optimisation for Event-based Visual Odometry [53.59879499700895]
Event cameras open up new possibilities for robotic perception due to their low latency and high dynamic range. We focus on event-based visual odometry (VO) We propose an asynchronous structure-from-motion optimisation back-end.
arXiv Detail & Related papers (2022-03-02T11:28:47Z)
Learning-Based Framework for Camera Calibration with Distortion Correction and High Precision Feature Detection [14.297068346634351]
We propose a hybrid camera calibration framework which combines learning-based approaches with traditional methods to handle these bottlenecks. In particular, this framework leverages learning-based approaches to perform efficient distortion correction and robust chessboard corner coordinate encoding. Compared with two widely-used camera calibration toolboxes, experiment results on both real and synthetic datasets manifest the better robustness and higher precision of the proposed framework.
arXiv Detail & Related papers (2022-02-01T00:19:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.