Related papers: FSViewFusion: Few-Shots View Generation of Novel Objects

FSViewFusion: Few-Shots View Generation of Novel Objects

URL: http://arxiv.org/abs/2403.06394v2
Date: Wed, 13 Mar 2024 02:41:34 GMT
Title: FSViewFusion: Few-Shots View Generation of Novel Objects
Authors: Rukhshanda Hussain, Hui Xian Grace Lim, Borchun Chen, Mubarak Shah, Ser Nam Lim
Abstract summary: We introduce a pretrained stable diffusion model for view synthesis without explicit 3D priors. Specifically, we base our method on a personalized text to image model, Dreambooth, given its strong ability to adapt to specific novel objects with a few shots. We establish that the concept of a view can be disentangled and transferred to a novel object irrespective of the original object's identify from which the views are learnt.
Score: 75.81872204650807
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Novel view synthesis has observed tremendous developments since the arrival of NeRFs. However, Nerf models overfit on a single scene, lacking generalization to out of distribution objects. Recently, diffusion models have exhibited remarkable performance on introducing generalization in view synthesis. Inspired by these advancements, we explore the capabilities of a pretrained stable diffusion model for view synthesis without explicit 3D priors. Specifically, we base our method on a personalized text to image model, Dreambooth, given its strong ability to adapt to specific novel objects with a few shots. Our research reveals two interesting findings. First, we observe that Dreambooth can learn the high level concept of a view, compared to arguably more complex strategies which involve finetuning diffusions on large amounts of multi-view data. Second, we establish that the concept of a view can be disentangled and transferred to a novel object irrespective of the original object's identify from which the views are learnt. Motivated by this, we introduce a learning strategy, FSViewFusion, which inherits a specific view through only one image sample of a single scene, and transfers the knowledge to a novel object, learnt from few shots, using low rank adapters. Through extensive experiments we demonstrate that our method, albeit simple, is efficient in generating reliable view samples for in the wild images. Code and models will be released.

Related papers

WAVE: Warp-Based View Guidance for Consistent Novel View Synthesis Using a Single Image [3.4248731707266264]
This paper proposes a novel view-consistent image generation method which utilizes diffusion models without additional modules.<n>Our key idea is to enhance diffusion models with a training-free method that enables adaptive attention manipulation and noise reinitialization.<n>Our method improves view consistency across various diffusion models, demonstrating its broader applicability.
arXiv Detail & Related papers (2025-06-30T05:00:47Z)
GAS: Generative Avatar Synthesis from a Single Image [54.95198111659466]
We introduce a generalizable and unified framework to synthesize view-consistent and temporally coherent avatars from a single image. Our approach bridges this gap by combining the reconstruction power of regression-based 3D human reconstruction with the generative capabilities of a diffusion model.
arXiv Detail & Related papers (2025-02-10T19:00:39Z)
Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models [65.82564074712836]
We introduce DIFfusionHOI, a new HOI detector shedding light on text-to-image diffusion models. We first devise an inversion-based strategy to learn the expression of relation patterns between humans and objects in embedding space. These learned relation embeddings then serve as textual prompts, to steer diffusion models generate images that depict specific interactions.
arXiv Detail & Related papers (2024-10-26T12:00:33Z)
iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse Views [61.707755434165335]
iFusion is a novel 3D object reconstruction framework that requires only two views with unknown camera poses. We harness a pre-trained novel view synthesis diffusion model, which embeds implicit knowledge about the geometry and appearance of diverse objects. Experiments demonstrate strong performance in both pose estimation and novel view synthesis.
arXiv Detail & Related papers (2023-12-28T18:59:57Z)
UpFusion: Novel View Diffusion from Unposed Sparse View Observations [66.36092764694502]
UpFusion can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images. We show that this mechanism allows generating high-fidelity novel views while improving the synthesis quality given additional (unposed) images.
arXiv Detail & Related papers (2023-12-11T18:59:55Z)
DreamComposer: Controllable 3D Object Generation via Multi-View Conditions [45.4321454586475]
Recent works are capable of generating high-quality novel views from a single in-the-wild image. Due to the lack of information from multiple views, these works encounter difficulties in generating controllable novel views. We present DreamComposer, a flexible and scalable framework that can enhance existing view-aware diffusion models by injecting multi-view conditions.
arXiv Detail & Related papers (2023-12-06T16:55:53Z)
ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models [33.760292331843104]
Generating novel views of an object from a single image is a challenging task. Recent methods for view synthesis based on diffusion have shown great progress. We demonstrate a simple method, where we utilize a pre-trained video diffusion model.
arXiv Detail & Related papers (2023-12-03T06:50:15Z)
SODA: Bottleneck Diffusion Models for Representation Learning [75.7331354734152]
We introduce SODA, a self-supervised diffusion model, designed for representation learning. The model incorporates an image encoder, which distills a source view into a compact representation, that guides the generation of related novel views. We show that by imposing a tight bottleneck between the encoder and a denoising decoder, we can turn diffusion models into strong representation learners.
arXiv Detail & Related papers (2023-11-29T18:53:34Z)
DreamSparse: Escaping from Plato's Cave with 2D Frozen Diffusion Model Given Sparse Views [20.685453627120832]
Existing methods often struggle with producing high-quality results or necessitate per-object optimization in such few-view settings. DreamSparse is capable of synthesizing high-quality novel views for both object and scene-level images.
arXiv Detail & Related papers (2023-06-06T05:26:26Z)
Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction [77.69363640021503]
3D-aware image synthesis encompasses a variety of tasks, such as scene generation and novel view synthesis from images. We present SSDNeRF, a unified approach that employs an expressive diffusion model to learn a generalizable prior of neural radiance fields (NeRF) from multi-view images of diverse objects.
arXiv Detail & Related papers (2023-04-13T17:59:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.