FSViewFusion: Few-Shots View Generation of Novel Objects
- URL: http://arxiv.org/abs/2403.06394v2
- Date: Wed, 13 Mar 2024 02:41:34 GMT
- Title: FSViewFusion: Few-Shots View Generation of Novel Objects
- Authors: Rukhshanda Hussain, Hui Xian Grace Lim, Borchun Chen, Mubarak Shah,
Ser Nam Lim
- Abstract summary: We introduce a pretrained stable diffusion model for view synthesis without explicit 3D priors.
Specifically, we base our method on a personalized text to image model, Dreambooth, given its strong ability to adapt to specific novel objects with a few shots.
We establish that the concept of a view can be disentangled and transferred to a novel object irrespective of the original object's identify from which the views are learnt.
- Score: 75.81872204650807
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Novel view synthesis has observed tremendous developments since the arrival
of NeRFs. However, Nerf models overfit on a single scene, lacking
generalization to out of distribution objects. Recently, diffusion models have
exhibited remarkable performance on introducing generalization in view
synthesis. Inspired by these advancements, we explore the capabilities of a
pretrained stable diffusion model for view synthesis without explicit 3D
priors. Specifically, we base our method on a personalized text to image model,
Dreambooth, given its strong ability to adapt to specific novel objects with a
few shots. Our research reveals two interesting findings. First, we observe
that Dreambooth can learn the high level concept of a view, compared to
arguably more complex strategies which involve finetuning diffusions on large
amounts of multi-view data. Second, we establish that the concept of a view can
be disentangled and transferred to a novel object irrespective of the original
object's identify from which the views are learnt. Motivated by this, we
introduce a learning strategy, FSViewFusion, which inherits a specific view
through only one image sample of a single scene, and transfers the knowledge to
a novel object, learnt from few shots, using low rank adapters. Through
extensive experiments we demonstrate that our method, albeit simple, is
efficient in generating reliable view samples for in the wild images. Code and
models will be released.
Related papers
- MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image.
Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z) - iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse
Views [61.707755434165335]
iFusion is a novel 3D object reconstruction framework that requires only two views with unknown camera poses.
We harness a pre-trained novel view synthesis diffusion model, which embeds implicit knowledge about the geometry and appearance of diverse objects.
Experiments demonstrate strong performance in both pose estimation and novel view synthesis.
arXiv Detail & Related papers (2023-12-28T18:59:57Z) - UpFusion: Novel View Diffusion from Unposed Sparse View Observations [66.36092764694502]
UpFusion can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images.
We show that this mechanism allows generating high-fidelity novel views while improving the synthesis quality given additional (unposed) images.
arXiv Detail & Related papers (2023-12-11T18:59:55Z) - ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models [33.760292331843104]
Generating novel views of an object from a single image is a challenging task.
Recent methods for view synthesis based on diffusion have shown great progress.
We demonstrate a simple method, where we utilize a pre-trained video diffusion model.
arXiv Detail & Related papers (2023-12-03T06:50:15Z) - SODA: Bottleneck Diffusion Models for Representation Learning [75.7331354734152]
We introduce SODA, a self-supervised diffusion model, designed for representation learning.
The model incorporates an image encoder, which distills a source view into a compact representation, that guides the generation of related novel views.
We show that by imposing a tight bottleneck between the encoder and a denoising decoder, we can turn diffusion models into strong representation learners.
arXiv Detail & Related papers (2023-11-29T18:53:34Z) - Viewpoint Textual Inversion: Unleashing Novel View Synthesis with
Pretrained 2D Diffusion Models [13.760540874218705]
We show that 3D knowledge is encoded in 2D image diffusion models like Stable Diffusion.
Our method, Viewpoint Neural Textual Inversion (ViewNeTI), controls the 3D viewpoint of objects in generated images from frozen diffusion models.
arXiv Detail & Related papers (2023-09-14T18:52:16Z) - DreamSparse: Escaping from Plato's Cave with 2D Frozen Diffusion Model
Given Sparse Views [20.685453627120832]
Existing methods often struggle with producing high-quality results or necessitate per-object optimization in such few-view settings.
DreamSparse is capable of synthesizing high-quality novel views for both object and scene-level images.
arXiv Detail & Related papers (2023-06-06T05:26:26Z) - Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and
Reconstruction [77.69363640021503]
3D-aware image synthesis encompasses a variety of tasks, such as scene generation and novel view synthesis from images.
We present SSDNeRF, a unified approach that employs an expressive diffusion model to learn a generalizable prior of neural radiance fields (NeRF) from multi-view images of diverse objects.
arXiv Detail & Related papers (2023-04-13T17:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.