Related papers: A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose

A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose

URL: http://arxiv.org/abs/2405.03659v2
Date: Mon, 10 Jun 2024 22:51:42 GMT
Title: A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose
Authors: Kaiwen Jiang, Yang Fu, Mukund Varma T, Yash Belhe, Xiaolong Wang, Hao Su, Ravi Ramamoorthi,
Abstract summary: We develop a novel construct-and-optimize method for sparse view synthesis without camera poses. Specifically, we construct a solution by using monocular depth and projecting pixels back into the 3D world. We demonstrate results on the Tanks and Temples and Static Hikes datasets with as few as three widely-spaced views.
Score: 44.13819148680788
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Novel view synthesis from a sparse set of input images is a challenging problem of great practical interest, especially when camera poses are absent or inaccurate. Direct optimization of camera poses and usage of estimated depths in neural radiance field algorithms usually do not produce good results because of the coupling between poses and depths, and inaccuracies in monocular depth estimation. In this paper, we leverage the recent 3D Gaussian splatting method to develop a novel construct-and-optimize method for sparse view synthesis without camera poses. Specifically, we construct a solution progressively by using monocular depth and projecting pixels back into the 3D world. During construction, we optimize the solution by detecting 2D correspondences between training views and the corresponding rendered images. We develop a unified differentiable pipeline for camera registration and adjustment of both camera poses and depths, followed by back-projection. We also introduce a novel notion of an expected surface in Gaussian splatting, which is critical to our optimization. These steps enable a coarse solution, which can then be low-pass filtered and refined using standard optimization methods. We demonstrate results on the Tanks and Temples and Static Hikes datasets with as few as three widely-spaced views, showing significantly better quality than competing methods, including those with approximate camera pose information. Moreover, our results improve with more views and outperform previous InstantNGP and Gaussian Splatting algorithms even when using half the dataset. Project page: https://raymondjiangkw.github.io/cogs.github.io/

Related papers

AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views [57.13066710710485]
AnySplat is a feed forward network for novel view synthesis from uncalibrated image collections.<n>A single forward pass yields a set of 3D Gaussian primitives encoding both scene geometry and appearance.<n>In extensive zero shot evaluations, AnySplat matches the quality of pose aware baselines in both sparse and dense view scenarios.
arXiv Detail & Related papers (2025-05-29T17:49:56Z)
Sparfels: Fast Reconstruction from Sparse Unposed Imagery [8.362190332905524]
We present a method for Sparse view reconstruction with surface element splatting that runs within 3 minutes on a consumer grade GPU.<n>Key to our contribution is a novel formulation of splatted color variance along rays, which can be computed efficiently.<n>We demonstrate state-of-the-art performances in the sparse uncalibrated setting in reconstruction and novel view benchmarks based on established multi-view datasets.
arXiv Detail & Related papers (2025-05-04T16:40:24Z)
FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views [93.6881532277553]
We present FLARE, a feed-forward model designed to infer high-quality camera poses and 3D geometry from uncalibrated sparse-view images. Our solution features a cascaded learning paradigm with camera pose serving as the critical bridge, recognizing its essential role in mapping 3D structures onto 2D image planes.
arXiv Detail & Related papers (2025-02-17T18:54:05Z)
GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views [67.34073368933814]
We propose a generalizable Gaussian Splatting approach for high-resolution image rendering under a sparse-view camera setting. We train our Gaussian parameter regression module on human-only data or human-scene data, jointly with a depth estimation module to lift 2D parameter maps to 3D space. Experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.
arXiv Detail & Related papers (2024-11-18T08:18:44Z)
No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images. Our model achieves real-time 3D Gaussian reconstruction during inference. This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z)
PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices. Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z)
Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization [11.418632671254564]
3D Gaussian Splatting has emerged as a powerful tool for fast and accurate novel-view synthesis from a set of posed input images. We propose an extension to the 3D Gaussian Splatting framework by optimizing the extrinsic camera parameters with respect to photometric residuals. We show results on real-world scenes and complex trajectories through simulated environments.
arXiv Detail & Related papers (2024-10-11T12:01:15Z)
CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians [18.42203035154126]
We introduce a structured Gaussian representation that can be controlled in 2D image space. We then constraint the Gaussians, in particular their position, and prevent them from moving independently during optimization. We demonstrate significant improvements compared to the state-of-the-art sparse-view NeRF-based approaches on a variety of scenes.
arXiv Detail & Related papers (2024-03-28T15:27:13Z)
COLMAP-Free 3D Gaussian Splatting [88.420322646756]
We propose a novel method to perform novel view synthesis without any SfM preprocessing. We process the input frames in a sequential manner and progressively grow the 3D Gaussians set by taking one input frame at a time. Our method significantly improves over previous approaches in view synthesis and camera pose estimation under large motion changes.
arXiv Detail & Related papers (2023-12-12T18:39:52Z)
FvOR: Robust Joint Shape and Pose Optimization for Few-view Object Reconstruction [37.81077373162092]
Reconstructing an accurate 3D object model from a few image observations remains a challenging problem in computer vision. We present FvOR, a learning-based object reconstruction method that predicts accurate 3D models given a few images with noisy input poses.
arXiv Detail & Related papers (2022-05-16T15:39:27Z)
Human Body Model Fitting by Learned Gradient Descent [48.79414884222403]
We propose a novel algorithm for the fitting of 3D human shape to images. We show that this algorithm is fast (avg. 120ms convergence), robust to dataset, and achieves state-of-the-art results on public evaluation datasets.
arXiv Detail & Related papers (2020-08-19T14:26:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.