ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion
- URL: http://arxiv.org/abs/2310.10343v1
- Date: Mon, 16 Oct 2023 12:29:29 GMT
- Title: ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion
- Authors: Jiayu Yang, Ziang Cheng, Yunfei Duan, Pan Ji, Hongdong Li
- Abstract summary: Given a single image of a 3D object, this paper proposes a method (named ConsistNet) that is able to generate multiple images of the same object.
Our method effectively learns 3D consistency over a frozen Zero123 backbone and can generate 16 surrounding views of the object within 40 seconds on a single A100 GPU.
- Score: 61.37481051263816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given a single image of a 3D object, this paper proposes a novel method
(named ConsistNet) that is able to generate multiple images of the same object,
as if seen they are captured from different viewpoints, while the 3D
(multi-view) consistencies among those multiple generated images are
effectively exploited. Central to our method is a multi-view consistency block
which enables information exchange across multiple single-view diffusion
processes based on the underlying multi-view geometry principles. ConsistNet is
an extension to the standard latent diffusion model, and consists of two
sub-modules: (a) a view aggregation module that unprojects multi-view features
into global 3D volumes and infer consistency, and (b) a ray aggregation module
that samples and aggregate 3D consistent features back to each view to enforce
consistency. Our approach departs from previous methods in multi-view image
generation, in that it can be easily dropped-in pre-trained LDMs without
requiring explicit pixel correspondences or depth prediction. Experiments show
that our method effectively learns 3D consistency over a frozen Zero123
backbone and can generate 16 surrounding views of the object within 40 seconds
on a single A100 GPU. Our code will be made available on
https://github.com/JiayuYANG/ConsistNet
Related papers
- Duoduo CLIP: Efficient 3D Understanding with Multi-View Images [14.572094389643173]
Duoduo CLIP is a model for 3D representation learning that learns shape encodings from multi-view images instead of point-clouds.
Our approach not only shows better generalization compared to existing point cloud methods, but also reduces GPU requirements and training time.
arXiv Detail & Related papers (2024-06-17T14:16:12Z) - MVDiff: Scalable and Flexible Multi-View Diffusion for 3D Object Reconstruction from Single-View [0.0]
This paper proposes a general framework to generate consistent multi-view images from single image or leveraging scene representation transformer and view-conditioned diffusion model.
Our model is able to generate 3D meshes surpassing baselines methods in evaluation metrics, including PSNR, SSIM and LPIPS.
arXiv Detail & Related papers (2024-05-06T22:55:53Z) - MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation [54.27399121779011]
We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images.
We show that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods.
arXiv Detail & Related papers (2024-04-04T17:59:57Z) - VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model [34.35449902855767]
Two fundamental questions are what data we use for training and how to ensure multi-view consistency.
We propose a dense consistent multi-view generation model that is fine-tuned from off-the-shelf video generative models.
Our approach can generate 24 dense views and converges much faster in training than state-of-the-art approaches.
arXiv Detail & Related papers (2024-03-18T17:48:15Z) - LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content
Creation [51.19871052619077]
We introduce Large Multi-View Gaussian Model (LGM), a novel framework designed to generate high-resolution 3D models from text prompts or single-view images.
We maintain the fast speed to generate 3D objects within 5 seconds while boosting the training resolution to 512, thereby achieving high-resolution 3D content generation.
arXiv Detail & Related papers (2024-02-07T17:57:03Z) - SyncDreamer: Generating Multiview-consistent Images from a Single-view Image [59.75474518708409]
A novel diffusion model called SyncDreamer generates multiview-consistent images from a single-view image.
Experiments show that SyncDreamer generates images with high consistency across different views.
arXiv Detail & Related papers (2023-09-07T02:28:04Z) - Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D
Data [76.38261311948649]
Viewset Diffusion is a diffusion-based generator that outputs 3D objects while only using multi-view 2D data for supervision.
We train a diffusion model to generate viewsets, but design the neural network generator to reconstruct internally corresponding 3D models.
The model performs reconstruction efficiently, in a feed-forward manner, and is trained using only rendering losses using as few as three views per viewset.
arXiv Detail & Related papers (2023-06-13T16:18:51Z) - MVTN: Learning Multi-View Transformations for 3D Understanding [60.15214023270087]
We introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition.
MVTN can be trained end-to-end with any multi-view network for 3D shape recognition.
Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks.
arXiv Detail & Related papers (2022-12-27T12:09:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.