FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent
- URL: http://arxiv.org/abs/2404.15259v3
- Date: Tue, 23 Jul 2024 13:41:03 GMT
- Title: FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent
- Authors: Cameron Smith, David Charatan, Ayush Tewari, Vincent Sitzmann,
- Abstract summary: FlowMap is an end-to-end differentiable method that solves for precise camera poses, camera intrinsics, and per-frame dense depth of a video sequence.
Our method performs per-video gradient-descent minimization of a simple least-squares objective.
We empirically show that camera parameters and dense depth recovered by our method enable photo-realistic novel view synthesis on 360-degree trajectories.
- Score: 19.977807508281835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces FlowMap, an end-to-end differentiable method that solves for precise camera poses, camera intrinsics, and per-frame dense depth of a video sequence. Our method performs per-video gradient-descent minimization of a simple least-squares objective that compares the optical flow induced by depth, intrinsics, and poses against correspondences obtained via off-the-shelf optical flow and point tracking. Alongside the use of point tracks to encourage long-term geometric consistency, we introduce differentiable re-parameterizations of depth, intrinsics, and pose that are amenable to first-order optimization. We empirically show that camera parameters and dense depth recovered by our method enable photo-realistic novel view synthesis on 360-degree trajectories using Gaussian Splatting. Our method not only far outperforms prior gradient-descent based bundle adjustment methods, but surprisingly performs on par with COLMAP, the state-of-the-art SfM method, on the downstream task of 360-degree novel view synthesis (even though our method is purely gradient-descent based, fully differentiable, and presents a complete departure from conventional SfM).
Related papers
- Correspondence-Guided SfM-Free 3D Gaussian Splatting for NVS [52.3215552448623]
Novel View Synthesis (NVS) without Structure-from-Motion (SfM) pre-processed camera poses are crucial for promoting rapid response capabilities and enhancing robustness against variable operating conditions.
Recent SfM-free methods have integrated pose optimization, designing end-to-end frameworks for joint camera pose estimation and NVS.
Most existing works rely on per-pixel image loss functions, such as L2 loss.
In this study, we propose a correspondence-guided SfM-free 3D Gaussian splatting for NVS.
arXiv Detail & Related papers (2024-08-16T13:11:22Z) - Fine Dense Alignment of Image Bursts through Camera Pose and Depth
Estimation [45.11207941777178]
This paper introduces a novel approach to the fine alignment of images in a burst captured by a handheld camera.
The proposed algorithm establishes dense correspondences by optimizing both the camera motion and surface depth and orientation at every pixel.
arXiv Detail & Related papers (2023-12-08T17:22:04Z) - $PC^2$: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D
Reconstruction [97.06927852165464]
Reconstructing the 3D shape of an object from a single RGB image is a long-standing and highly challenging problem in computer vision.
We propose a novel method for single-image 3D reconstruction which generates a sparse point cloud via a conditional denoising diffusion process.
arXiv Detail & Related papers (2023-02-21T13:37:07Z) - ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving
Cameras in the Wild [57.37891682117178]
We present a robust dense indirect structure-from-motion method for videos that is based on dense correspondence from pairwise optical flow.
A novel neural network architecture is proposed for processing irregular point trajectory data.
Experiments on MPI Sintel dataset show that our system produces significantly more accurate camera trajectories.
arXiv Detail & Related papers (2022-07-19T09:19:45Z) - DiffPoseNet: Direct Differentiable Camera Pose Estimation [11.941057800943653]
We introduce a network NFlowNet, for normal flow estimation which is used to enforce robust and direct constraints.
We perform extensive qualitative and quantitative evaluation of the proposed DiffPoseNet's sensitivity to noise and its generalization across datasets.
arXiv Detail & Related papers (2022-03-21T17:54:30Z) - Graph-Based Depth Denoising & Dequantization for Point Cloud Enhancement [47.61748619439693]
A 3D point cloud is typically constructed from depth measurements acquired by sensors at one or more viewpoints.
Previous works denoise a point cloud textita posteriori after projecting the imperfect depth data onto 3D space.
We enhance depth measurements directly on the sensed images textita priori, before synthesizing a 3D point cloud.
arXiv Detail & Related papers (2021-11-09T04:17:35Z) - Deep Two-View Structure-from-Motion Revisited [83.93809929963969]
Two-view structure-from-motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM.
We propose to revisit the problem of deep two-view SfM by leveraging the well-posedness of the classic pipeline.
Our method consists of 1) an optical flow estimation network that predicts dense correspondences between two frames; 2) a normalized pose estimation module that computes relative camera poses from the 2D optical flow correspondences, and 3) a scale-invariant depth estimation network that leverages epipolar geometry to reduce the search space, refine the dense correspondences, and estimate relative depth maps.
arXiv Detail & Related papers (2021-04-01T15:31:20Z) - Robust Consistent Video Depth Estimation [65.53308117778361]
We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video.
Our algorithm combines two complementary techniques: (1) flexible deformation-splines for low-frequency large-scale alignment and (2) geometry-aware depth filtering for high-frequency alignment of fine depth details.
In contrast to prior approaches, our method does not require camera poses as input and achieves robust reconstruction for challenging hand-held cell phone captures containing a significant amount of noise, shake, motion blur, and rolling shutter deformations.
arXiv Detail & Related papers (2020-12-10T18:59:48Z) - Combining 3D Model Contour Energy and Keypoints for Object Tracking [2.5782420501870287]
We present a new combined approach for monocular model-based 3D tracking.
A preliminary object pose is estimated by using a keypoint-based technique.
The pose is then refined by optimizing the contour energy function.
arXiv Detail & Related papers (2020-02-04T15:53:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.