Level-S$^2$fM: Structure from Motion on Neural Level Set of Implicit
Surfaces
- URL: http://arxiv.org/abs/2211.12018v2
- Date: Mon, 27 Mar 2023 06:20:51 GMT
- Title: Level-S$^2$fM: Structure from Motion on Neural Level Set of Implicit
Surfaces
- Authors: Yuxi Xiao and Nan Xue and Tianfu Wu and Gui-Song Xia
- Abstract summary: This paper presents a neural incremental Structure-from-Motion (SfM) approach, Level-S$2$fM, which estimates the camera poses and scene geometry from a set of uncalibrated images.
Our novel formulation poses some new challenges due to inevitable two-view and few-view configurations in the incremental SfM pipeline.
Not only does our Level-S$2$fM lead to promising results on camera pose estimation and scene geometry reconstruction, but it also shows a promising way for neural implicit rendering without knowing camera.
- Score: 36.06713735409501
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a neural incremental Structure-from-Motion (SfM)
approach, Level-S$^2$fM, which estimates the camera poses and scene geometry
from a set of uncalibrated images by learning coordinate MLPs for the implicit
surfaces and the radiance fields from the established keypoint correspondences.
Our novel formulation poses some new challenges due to inevitable two-view and
few-view configurations in the incremental SfM pipeline, which complicates the
optimization of coordinate MLPs for volumetric neural rendering with unknown
camera poses. Nevertheless, we demonstrate that the strong inductive basis
conveying in the 2D correspondences is promising to tackle those challenges by
exploiting the relationship between the ray sampling schemes. Based on this, we
revisit the pipeline of incremental SfM and renew the key components, including
two-view geometry initialization, the camera poses registration, the 3D points
triangulation, and Bundle Adjustment, with a fresh perspective based on neural
implicit surfaces. By unifying the scene geometry in small MLP networks through
coordinate MLPs, our Level-S$^2$fM treats the zero-level set of the implicit
surface as an informative top-down regularization to manage the reconstructed
3D points, reject the outliers in correspondences via querying SDF, and refine
the estimated geometries by NBA (Neural BA). Not only does our Level-S$^2$fM
lead to promising results on camera pose estimation and scene geometry
reconstruction, but it also shows a promising way for neural implicit rendering
without knowing camera extrinsic beforehand.
Related papers
- PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.
Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z) - SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning [17.99904937160487]
We introduce SCIPaD, a novel approach that incorporates spatial clues for unsupervised depth-pose joint learning.
SCIPaD achieves a reduction of 22.2% in average translation error and 34.8% in average angular error for camera pose estimation task on the KITTI Odometry dataset.
arXiv Detail & Related papers (2024-07-07T06:52:51Z) - NoPose-NeuS: Jointly Optimizing Camera Poses with Neural Implicit
Surfaces for Multi-view Reconstruction [0.0]
NoPose-NeuS is a neural implicit surface reconstruction method that extends NeuS to jointly optimize camera poses with the geometry and color networks.
We show that the proposed method can estimate relatively accurate camera poses, while maintaining a high surface reconstruction quality with 0.89 mean Chamfer distance.
arXiv Detail & Related papers (2023-12-23T12:18:22Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - Semantic Validation in Structure from Motion [0.0]
Structure from Motion (SfM) is the process of recovering the 3D structure of a scene from a series of projective measurements.
SfM consists of three main steps; feature detection and matching, camera motion estimation, and recovery of 3D structure.
This project offers a novel method for improved validation of 3D SfM models.
arXiv Detail & Related papers (2023-04-05T12:58:59Z) - TerrainMesh: Metric-Semantic Terrain Reconstruction from Aerial Images
Using Joint 2D-3D Learning [20.81202315793742]
This paper develops a joint 2D-3D learning approach to reconstruct a local metric-semantic mesh at each camera maintained by a visual odometry algorithm.
The mesh can be assembled into a global environment model to capture the terrain topology and semantics during online operation.
arXiv Detail & Related papers (2022-04-23T05:18:39Z) - Deep Non-rigid Structure-from-Motion: A Sequence-to-Sequence Translation Perspective [81.56957468529602]
We propose to model deep NRSfM from a sequence-to-sequence translation perspective.
First, we apply a shape-motion predictor to estimate the initial non-rigid shape and camera motion from a single frame.
Then we propose a context modeling module to model camera motions and complex non-rigid shapes.
arXiv Detail & Related papers (2022-04-10T17:13:52Z) - Soft Expectation and Deep Maximization for Image Feature Detection [68.8204255655161]
We propose SEDM, an iterative semi-supervised learning process that flips the question and first looks for repeatable 3D points, then trains a detector to localize them in image space.
Our results show that this new model trained using SEDM is able to better localize the underlying 3D points in a scene.
arXiv Detail & Related papers (2021-04-21T00:35:32Z) - Deep Two-View Structure-from-Motion Revisited [83.93809929963969]
Two-view structure-from-motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM.
We propose to revisit the problem of deep two-view SfM by leveraging the well-posedness of the classic pipeline.
Our method consists of 1) an optical flow estimation network that predicts dense correspondences between two frames; 2) a normalized pose estimation module that computes relative camera poses from the 2D optical flow correspondences, and 3) a scale-invariant depth estimation network that leverages epipolar geometry to reduce the search space, refine the dense correspondences, and estimate relative depth maps.
arXiv Detail & Related papers (2021-04-01T15:31:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.