Joint Optimization of Neural Radiance Fields and Continuous Camera Motion from a Monocular Video
- URL: http://arxiv.org/abs/2504.19819v1
- Date: Mon, 28 Apr 2025 14:22:04 GMT
- Title: Joint Optimization of Neural Radiance Fields and Continuous Camera Motion from a Monocular Video
- Authors: Hoang Chuong Nguyen, Wei Mao, Jose M. Alvarez, Miaomiao Liu,
- Abstract summary: We propose a novel method that eliminates prior dependencies by modeling continuous camera motions as time-dependent angular velocity and velocity.<n>Our approach achieves superior camera pose and depth estimation and comparable novel-view synthesis performance compared to state-of-the-art methods.
- Score: 22.760823792026056
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Neural Radiance Fields (NeRF) has demonstrated its superior capability to represent 3D geometry but require accurately precomputed camera poses during training. To mitigate this requirement, existing methods jointly optimize camera poses and NeRF often relying on good pose initialisation or depth priors. However, these approaches struggle in challenging scenarios, such as large rotations, as they map each camera to a world coordinate system. We propose a novel method that eliminates prior dependencies by modeling continuous camera motions as time-dependent angular velocity and velocity. Relative motions between cameras are learned first via velocity integration, while camera poses can be obtained by aggregating such relative motions up to a world coordinate system defined at a single time step within the video. Specifically, accurate continuous camera movements are learned through a time-dependent NeRF, which captures local scene geometry and motion by training from neighboring frames for each time step. The learned motions enable fine-tuning the NeRF to represent the full scene geometry. Experiments on Co3D and Scannet show our approach achieves superior camera pose and depth estimation and comparable novel-view synthesis performance compared to state-of-the-art methods. Our code is available at https://github.com/HoangChuongNguyen/cope-nerf.
Related papers
- AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos [52.726585508669686]
We propose AnyCam, a fast transformer model that directly estimates camera poses and intrinsics from a dynamic video sequence.<n>We test AnyCam on established datasets, where it delivers accurate camera poses and intrinsics both qualitatively and quantitatively.<n>By combining camera information, uncertainty, and depth, our model can produce high-quality 4D pointclouds.
arXiv Detail & Related papers (2025-03-30T02:22:11Z) - FRAME: Floor-aligned Representation for Avatar Motion from Egocentric Video [52.33896173943054]
Egocentric motion capture with a head-mounted body-facing stereo camera is crucial for VR and AR applications.
Existing methods rely on synthetic pretraining and struggle to generate smooth and accurate predictions in real-world settings.
We propose FRAME, a simple yet effective architecture that combines device pose and camera feeds for state-of-the-art body pose prediction.
arXiv Detail & Related papers (2025-03-29T14:26:06Z) - Humans as a Calibration Pattern: Dynamic 3D Scene Reconstruction from Unsynchronized and Uncalibrated Videos [12.19207713016543]
Recent works on dynamic 3D neural field reconstruction assume the input from multi-view videos whose poses are known.<n>We show that unchronized setups can generate dynamic dynamic videos capture human motion.
arXiv Detail & Related papers (2024-12-26T07:04:20Z) - MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos [104.1338295060383]
We present a system that allows for accurate, fast, and robust estimation of camera parameters and depth maps from casual monocular videos of dynamic scenes.
Our system is significantly more accurate and robust at camera pose and depth estimation when compared with prior and concurrent work.
arXiv Detail & Related papers (2024-12-05T18:59:42Z) - CRiM-GS: Continuous Rigid Motion-Aware Gaussian Splatting from Motion-Blurred Images [14.738528284246545]
CRiM-GS is a textbfContinuous textbfRigid textbfMotion-aware textbfGaussian textbfSplatting.<n>It reconstructs precise 3D scenes from motion-blurred images while maintaining real-time rendering speed.
arXiv Detail & Related papers (2024-07-04T13:37:04Z) - COLMAP-Free 3D Gaussian Splatting [88.420322646756]
We propose a novel method to perform novel view synthesis without any SfM preprocessing.
We process the input frames in a sequential manner and progressively grow the 3D Gaussians set by taking one input frame at a time.
Our method significantly improves over previous approaches in view synthesis and camera pose estimation under large motion changes.
arXiv Detail & Related papers (2023-12-12T18:39:52Z) - Continuous Pose for Monocular Cameras in Neural Implicit Representation [65.40527279809474]
In this paper, we showcase the effectiveness of optimizing monocular camera poses as a continuous function of time.
We exploit the proposed method in four diverse experimental settings.
Using the assumption of continuous motion, changes in pose may actually live in a manifold that has lower than 6 degrees of freedom (DOF)
We call this low DOF motion representation as the emphintrinsic motion and use the approach in vSLAM settings, showing impressive camera tracking performance.
arXiv Detail & Related papers (2023-11-28T13:14:58Z) - Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes [8.061773364318313]
We present an approach to estimating camera rotation in crowded, real-world scenes from handheld monocular video.
We provide a new dataset and benchmark, with high-accuracy, rigorously verified ground truth, on 17 video sequences.
This represents a strong new performance point for crowded scenes, an important setting for computer vision.
arXiv Detail & Related papers (2023-09-15T17:44:07Z) - FlowCam: Training Generalizable 3D Radiance Fields without Camera Poses
via Pixel-Aligned Scene Flow [26.528667940013598]
Reconstruction of 3D neural fields from posed images has emerged as a promising method for self-supervised representation learning.
Key challenge preventing the deployment of these 3D scene learners on large-scale video data is their dependence on precise camera poses from structure-from-motion.
We propose a method that jointly reconstructs camera poses and 3D neural scene representations online and in a single forward pass.
arXiv Detail & Related papers (2023-05-31T20:58:46Z) - Spatiotemporal Bundle Adjustment for Dynamic 3D Human Reconstruction in
the Wild [49.672487902268706]
We present a framework that jointly estimates camera temporal alignment and 3D point triangulation.
We reconstruct 3D motion trajectories of human bodies in events captured by multiple unsynchronized and unsynchronized video cameras.
arXiv Detail & Related papers (2020-07-24T23:50:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.