Efficient-3DiM: Learning a Generalizable Single-image Novel-view
Synthesizer in One Day
- URL: http://arxiv.org/abs/2310.03015v1
- Date: Wed, 4 Oct 2023 17:57:07 GMT
- Title: Efficient-3DiM: Learning a Generalizable Single-image Novel-view
Synthesizer in One Day
- Authors: Yifan Jiang, Hao Tang, Jen-Hao Rick Chang, Liangchen Song, Zhangyang
Wang, Liangliang Cao
- Abstract summary: We propose a framework to learn a single-image novel-view synthesizer.
Our framework is able to reduce the total training time from 10 days to less than 1 day.
- Score: 63.96075838322437
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The task of novel view synthesis aims to generate unseen perspectives of an
object or scene from a limited set of input images. Nevertheless, synthesizing
novel views from a single image still remains a significant challenge in the
realm of computer vision. Previous approaches tackle this problem by adopting
mesh prediction, multi-plain image construction, or more advanced techniques
such as neural radiance fields. Recently, a pre-trained diffusion model that is
specifically designed for 2D image synthesis has demonstrated its capability in
producing photorealistic novel views, if sufficiently optimized on a 3D
finetuning task. Although the fidelity and generalizability are greatly
improved, training such a powerful diffusion model requires a vast volume of
training data and model parameters, resulting in a notoriously long time and
high computational costs. To tackle this issue, we propose Efficient-3DiM, a
simple but effective framework to learn a single-image novel-view synthesizer.
Motivated by our in-depth analysis of the inference process of diffusion
models, we propose several pragmatic strategies to reduce the training overhead
to a manageable scale, including a crafted timestep sampling strategy, a
superior 3D feature extractor, and an enhanced training scheme. When combined,
our framework is able to reduce the total training time from 10 days to less
than 1 day, significantly accelerating the training process under the same
computational platform (one instance with 8 Nvidia A100 GPUs). Comprehensive
experiments are conducted to demonstrate the efficiency and generalizability of
our proposed method.
Related papers
- MMDRFuse: Distilled Mini-Model with Dynamic Refresh for Multi-Modality Image Fusion [32.38584862347954]
A lightweight Distilled Mini-Model with a Dynamic Refresh strategy (MMDRFuse) is proposed to achieve this objective.
To pursue model parsimony, an extremely small convolutional network with a total of 113 trainable parameters (0.44 KB) is obtained.
Experiments on several public datasets demonstrate that our method exhibits promising advantages in terms of model efficiency and complexity.
arXiv Detail & Related papers (2024-08-28T08:52:33Z) - One-Shot Image Restoration [0.0]
Experimental results demonstrate the applicability, robustness and computational efficiency of the proposed approach for supervised image deblurring and super-resolution.
Our results showcase significant improvement of learning models' sample efficiency, generalization and time complexity.
arXiv Detail & Related papers (2024-04-26T14:03:23Z) - GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time [112.32349668385635]
GGRt is a novel approach to generalizable novel view synthesis that alleviates the need for real camera poses.
As the first pose-free generalizable 3D-GS framework, GGRt achieves inference at $ge$ 5 FPS and real-time rendering at $ge$ 100 FPS.
arXiv Detail & Related papers (2024-03-15T09:47:35Z) - E$^{2}$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation [69.72194342962615]
We introduce and address a novel research direction: can the process of distilling GANs from diffusion models be made significantly more efficient?
First, we construct a base GAN model with generalized features, adaptable to different concepts through fine-tuning, eliminating the need for training from scratch.
Second, we identify crucial layers within the base GAN model and employ Low-Rank Adaptation (LoRA) with a simple yet effective rank search process, rather than fine-tuning the entire base model.
Third, we investigate the minimal amount of data necessary for fine-tuning, further reducing the overall training time.
arXiv Detail & Related papers (2024-01-11T18:59:14Z) - Robust Category-Level 3D Pose Estimation from Synthetic Data [17.247607850702558]
We introduce SyntheticP3D, a new synthetic dataset for object pose estimation generated from CAD models.
We propose a novel approach (CC3D) for training neural mesh models that perform pose estimation via inverse rendering.
arXiv Detail & Related papers (2023-05-25T14:56:03Z) - Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and
Reconstruction [77.69363640021503]
3D-aware image synthesis encompasses a variety of tasks, such as scene generation and novel view synthesis from images.
We present SSDNeRF, a unified approach that employs an expressive diffusion model to learn a generalizable prior of neural radiance fields (NeRF) from multi-view images of diverse objects.
arXiv Detail & Related papers (2023-04-13T17:59:01Z) - ProbNVS: Fast Novel View Synthesis with Learned Probability-Guided
Sampling [42.37704606186928]
We propose to build a novel view synthesis framework based on learned MVS priors.
We show that our method achieves 15 to 40 times faster rendering compared to state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-07T14:45:42Z) - Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image
Decomposition [67.9464567157846]
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties.
Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-06-29T12:53:58Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.