Fillerbuster: Multi-View Scene Completion for Casual Captures
- URL: http://arxiv.org/abs/2502.05175v1
- Date: Fri, 07 Feb 2025 18:59:51 GMT
- Title: Fillerbuster: Multi-View Scene Completion for Casual Captures
- Authors: Ethan Weber, Norman Müller, Yash Kant, Vasu Agrawal, Michael Zollhöfer, Angjoo Kanazawa, Christian Richardt,
- Abstract summary: We present Fillerbuster, a method that completes unknown regions of a 3D scene by utilizing a novel large-scale multi-view latent diffusion transformer.
Our solution is to train a generative model that can consume a large context of input frames while generating unknown target views and recovering image poses when desired.
- Score: 48.12462469832712
- License:
- Abstract: We present Fillerbuster, a method that completes unknown regions of a 3D scene by utilizing a novel large-scale multi-view latent diffusion transformer. Casual captures are often sparse and miss surrounding content behind objects or above the scene. Existing methods are not suitable for handling this challenge as they focus on making the known pixels look good with sparse-view priors, or on creating the missing sides of objects from just one or two photos. In reality, we often have hundreds of input frames and want to complete areas that are missing and unobserved from the input frames. Additionally, the images often do not have known camera parameters. Our solution is to train a generative model that can consume a large context of input frames while generating unknown target views and recovering image poses when desired. We show results where we complete partial captures on two existing datasets. We also present an uncalibrated scene completion task where our unified model predicts both poses and creates new content. Our model is the first to predict many images and poses together for scene completion.
Related papers
- Geometry-Aware Diffusion Models for Multiview Scene Inpainting [24.963896970130065]
We focus on 3D scene inpainting, where parts of an input image set, captured from different viewpoints, are masked out.
Most recent work addresses this challenge by combining generative models with a 3D radiance field to fuse information across viewpoints.
We introduce a geometry-aware conditional generative model, capable of inpainting consistent images based on both geometric and appearance cues from reference images.
arXiv Detail & Related papers (2025-02-18T23:30:10Z) - Generic Objects as Pose Probes for Few-shot View Synthesis [14.768563613747633]
Radiance fields including NeRFs and 3D Gaussians demonstrate great potential in high-fidelity rendering and scene reconstruction.
COLMAP is frequently employed for preprocessing to estimate poses.
We aim to tackle few-view NeRF reconstruction using only 3 to 6 unposed scene images.
arXiv Detail & Related papers (2024-08-29T16:37:58Z) - SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views [36.02533658048349]
We propose a novel method, SpaRP, to reconstruct a 3D textured mesh and estimate the relative camera poses for sparse-view images.
SpaRP distills knowledge from 2D diffusion models and finetunes them to implicitly deduce the 3D spatial relationships between the sparse views.
It requires only about 20 seconds to produce a textured mesh and camera poses for the input views.
arXiv Detail & Related papers (2024-08-19T17:53:10Z) - Generalizable Pose Estimation Using Implicit Scene Representations [4.124185654280966]
6-DoF pose estimation is an essential component of robotic manipulation pipelines.
We address the generalization capability of pose estimation using models that contain enough information to render it in different poses.
Our final evaluation shows a significant improvement in inference performance and speed compared to existing approaches.
arXiv Detail & Related papers (2023-05-26T20:42:52Z) - Few-View Object Reconstruction with Unknown Categories and Camera Poses [80.0820650171476]
This work explores reconstructing general real-world objects from a few images without known camera poses or object categories.
The crux of our work is solving two fundamental 3D vision problems -- shape reconstruction and pose estimation.
Our method FORGE predicts 3D features from each view and leverages them in conjunction with the input images to establish cross-view correspondence.
arXiv Detail & Related papers (2022-12-08T18:59:02Z) - im2nerf: Image to Neural Radiance Field in the Wild [47.18702901448768]
im2nerf is a learning framework that predicts a continuous neural object representation given a single input image in the wild.
We show that im2nerf achieves the state-of-the-art performance for novel view synthesis from a single-view unposed image in the wild.
arXiv Detail & Related papers (2022-09-08T23:28:56Z) - Neural Rendering of Humans in Novel View and Pose from Monocular Video [68.37767099240236]
We introduce a new method that generates photo-realistic humans under novel views and poses given a monocular video as input.
Our method significantly outperforms existing approaches under unseen poses and novel views given monocular videos as input.
arXiv Detail & Related papers (2022-04-04T03:09:20Z) - Unsupervised Object Learning via Common Fate [61.14802390241075]
Learning generative object models from unlabelled videos is a long standing problem and required for causal scene modeling.
We decompose this problem into three easier subtasks, and provide candidate solutions for each of them.
We show that our approach allows learning generative models that generalize beyond the occlusions present in the input videos.
arXiv Detail & Related papers (2021-10-13T08:22:04Z) - Wide-Area Crowd Counting: Multi-View Fusion Networks for Counting in
Large Scenes [50.744452135300115]
We propose a deep neural network framework for multi-view crowd counting.
Our methods achieve state-of-the-art results compared to other multi-view counting baselines.
arXiv Detail & Related papers (2020-12-02T03:20:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.