Related papers: Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors

Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors

URL: http://arxiv.org/abs/2403.11503v2
Date: Sat, 13 Jul 2024 11:28:09 GMT
Title: Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors
Authors: Ruicheng Wang, Jianfeng Xiang, Jiaolong Yang, Xin Tong,
Abstract summary: We propose a novel image editing technique that enables 3D manipulations on single images. Our method directly leverages powerful image diffusion models trained on a broad spectrum of text-image pairs. Our method can generate high-quality 3D-aware image edits with large viewpoint transformations and high appearance and shape consistency with the input image.
Score: 24.478875248825563
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a novel image editing technique that enables 3D manipulations on single images, such as object rotation and translation. Existing 3D-aware image editing approaches typically rely on synthetic multi-view datasets for training specialized models, thus constraining their effectiveness on open-domain images featuring significantly more varied layouts and styles. In contrast, our method directly leverages powerful image diffusion models trained on a broad spectrum of text-image pairs and thus retain their exceptional generalization abilities. This objective is realized through the development of an iterative novel view synthesis and geometry alignment algorithm. The algorithm harnesses diffusion models for dual purposes: they provide appearance prior by predicting novel views of the selected object using estimated depth maps, and they act as a geometry critic by correcting misalignments in 3D shapes across the sampled views. Our method can generate high-quality 3D-aware image edits with large viewpoint transformations and high appearance and shape consistency with the input image, pushing the boundaries of what is possible with single-image 3D-aware editing.

Related papers

Geometry and Perception Guided Gaussians for Multiview-consistent 3D Generation from a Single Image [10.36303976374455]
Existing approaches often rely on fine-tuning pretrained 2D diffusion models or directly generating 3D information through fast network inference.<n>We present a novel method that seamlessly integrates geometry and perception priors without requiring additional model training.<n>Experiments demonstrate the higher-fidelity reconstruction results of our method, outperforming existing methods on novel view synthesis and 3D reconstruction.
arXiv Detail & Related papers (2025-06-26T11:22:06Z)
G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images [45.66479596827045]
We propose a Geometry-enhanced NeRF (G-NeRF), which seeks to enhance the geometry priors by a geometry-guided multi-view synthesis approach. To tackle the absence of multi-view supervision for single-view images, we design the depth-aware training approach.
arXiv Detail & Related papers (2024-04-11T04:58:18Z)
GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing [38.948892064761914]
GaussCtrl is a text-driven method to edit a 3D scene reconstructed by the 3D Gaussian Splatting (3DGS) Our key contribution is multi-view consistent editing, which enables editing all images together instead of iteratively editing one image.
arXiv Detail & Related papers (2024-03-13T17:35:28Z)
Wonder3D: Single Image to 3D using Cross-Domain Diffusion [105.16622018766236]
Wonder3D is a novel method for efficiently generating high-fidelity textured meshes from single-view images. To holistically improve the quality, consistency, and efficiency of image-to-3D tasks, we propose a cross-domain diffusion model.
arXiv Detail & Related papers (2023-10-23T15:02:23Z)
Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models. Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z)
DreamSparse: Escaping from Plato's Cave with 2D Frozen Diffusion Model Given Sparse Views [20.685453627120832]
Existing methods often struggle with producing high-quality results or necessitate per-object optimization in such few-view settings. DreamSparse is capable of synthesizing high-quality novel views for both object and scene-level images.
arXiv Detail & Related papers (2023-06-06T05:26:26Z)
Vox-E: Text-guided Voxel Editing of 3D Objects [14.88446525549421]
Large scale text-guided diffusion models have garnered significant attention due to their ability to synthesize diverse images. We present a technique that harnesses the power of latent diffusion models for editing existing 3D objects.
arXiv Detail & Related papers (2023-03-21T17:36:36Z)
High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views. Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z)
3D Magic Mirror: Clothing Reconstruction from a Single Image via a Causal Perspective [96.65476492200648]
This research aims to study a self-supervised 3D clothing reconstruction method. It recovers the geometry shape, and texture of human clothing from a single 2D image.
arXiv Detail & Related papers (2022-04-27T17:46:55Z)
Pixel2Mesh++: 3D Mesh Generation and Refinement from Multi-View Images [82.32776379815712]
We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses. We adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network. Our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable function for test-time optimization.
arXiv Detail & Related papers (2022-04-21T03:42:31Z)
Towards Realistic 3D Embedding via View Alignment [53.89445873577063]
This paper presents an innovative View Alignment GAN (VA-GAN) that composes new images by embedding 3D models into 2D background images realistically and automatically. VA-GAN consists of a texture generator and a differential discriminator that are inter-connected and end-to-end trainable.
arXiv Detail & Related papers (2020-07-14T14:45:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.