Light Field Diffusion for Single-View Novel View Synthesis
- URL: http://arxiv.org/abs/2309.11525v3
- Date: Tue, 12 Mar 2024 03:48:13 GMT
- Title: Light Field Diffusion for Single-View Novel View Synthesis
- Authors: Yifeng Xiong, Haoyu Ma, Shanlin Sun, Kun Han, Hao Tang, Xiaohui Xie
- Abstract summary: Single-view novel view synthesis (NVS) is important but challenging in computer vision.
Recent advancements in NVS have leveraged Denoising Diffusion Probabilistic Models (DDPMs) for their exceptional ability to produce high-fidelity images.
We present Light Field Diffusion (LFD), a novel conditional diffusion-based approach that transcends the conventional reliance on camera pose matrices.
- Score: 32.59286750410843
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Single-view novel view synthesis (NVS), the task of generating images from
new viewpoints based on a single reference image, is important but challenging
in computer vision. Recent advancements in NVS have leveraged Denoising
Diffusion Probabilistic Models (DDPMs) for their exceptional ability to produce
high-fidelity images. However, current diffusion-based methods typically
utilize camera pose matrices to globally and implicitly enforce 3D constraints,
which can lead to inconsistencies in images generated from varying viewpoints,
particularly in regions with complex textures and structures.
To address these limitations, we present Light Field Diffusion (LFD), a novel
conditional diffusion-based approach that transcends the conventional reliance
on camera pose matrices. Starting from the camera pose matrices, LFD transforms
them into light field encoding, with the same shape as the reference image, to
describe the direction of each ray. By integrating light field encoding with
the reference image, our method imposes local pixel-wise constraints within the
diffusion process, fostering enhanced view consistency. Our approach not only
involves training image LFD on the ShapeNet Car dataset but also includes
fine-tuning a pre-trained latent diffusion model on the Objaverse dataset. This
enables our latent LFD model to exhibit remarkable zero-shot generalization
capabilities across out-of-distribution datasets like RTMV as well as
in-the-wild images. Experiments demonstrate that LFD not only produces
high-fidelity images but also achieves superior 3D consistency in complex
regions, outperforming existing novel view synthesis methods.
Related papers
- Diffusion-based Light Field Synthesis [50.24624071354433]
LFdiff is a diffusion-based generative framework tailored for LF synthesis.
We propose DistgUnet, a disentanglement-based noise estimation network, to harness comprehensive LF representations.
Extensive experiments demonstrate that LFdiff excels in synthesizing visually pleasing and disparity-controllable light fields.
arXiv Detail & Related papers (2024-02-01T13:13:16Z) - Multi-View Unsupervised Image Generation with Cross Attention Guidance [23.07929124170851]
This paper introduces a novel pipeline for unsupervised training of a pose-conditioned diffusion model on single-category datasets.
We identify object poses by clustering the dataset through comparing visibility and locations of specific object parts.
Our model, MIRAGE, surpasses prior work in novel view synthesis on real images.
arXiv Detail & Related papers (2023-12-07T14:55:13Z) - Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.
We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework.
We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z) - IT3D: Improved Text-to-3D Generation with Explicit View Synthesis [71.68595192524843]
This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues.
Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images.
For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data.
arXiv Detail & Related papers (2023-08-22T14:39:17Z) - Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network [52.77569396659629]
This paper presents the deep compensation network unfolding (DCUNet) for restoring light field (LF) images captured under low-light conditions.
The framework uses the intermediate enhanced result to estimate the illumination map, which is then employed in the unfolding process to produce a new enhanced result.
To properly leverage the unique characteristics of LF images, this paper proposes a pseudo-explicit feature interaction module.
arXiv Detail & Related papers (2023-08-10T07:53:06Z) - Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction [60.52716381465063]
We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images.
Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality pseudo-observations.
Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times.
arXiv Detail & Related papers (2023-05-24T14:00:32Z) - Relightify: Relightable 3D Faces from a Single Image via Diffusion
Models [86.3927548091627]
We present the first approach to use diffusion models as a prior for highly accurate 3D facial BRDF reconstruction from a single image.
In contrast to existing methods, we directly acquire the observed texture from the input image, thus, resulting in more faithful and consistent estimation.
arXiv Detail & Related papers (2023-05-10T11:57:49Z) - Zero-1-to-3: Zero-shot One Image to 3D Object [30.455300183998247]
We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image.
Our conditional diffusion model uses a synthetic dataset to learn controls of the relative camera viewpoint.
Our method significantly outperforms state-of-the-art single-view 3D reconstruction and novel view synthesis models by leveraging Internet-scale pre-training.
arXiv Detail & Related papers (2023-03-20T17:59:50Z) - DiffRF: Rendering-Guided 3D Radiance Field Diffusion [18.20324411024166]
We introduce DiffRF, a novel approach for 3D radiance field synthesis based on denoising diffusion probabilistic models.
In contrast to 2D-diffusion models, our model learns multi-view consistent priors, enabling free-view synthesis and accurate shape generation.
arXiv Detail & Related papers (2022-12-02T14:37:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.