Vox-E: Text-guided Voxel Editing of 3D Objects
- URL: http://arxiv.org/abs/2303.12048v3
- Date: Tue, 19 Sep 2023 05:41:59 GMT
- Title: Vox-E: Text-guided Voxel Editing of 3D Objects
- Authors: Etai Sella, Gal Fiebelman, Peter Hedman, Hadar Averbuch-Elor
- Abstract summary: Large scale text-guided diffusion models have garnered significant attention due to their ability to synthesize diverse images.
We present a technique that harnesses the power of latent diffusion models for editing existing 3D objects.
- Score: 14.88446525549421
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large scale text-guided diffusion models have garnered significant attention
due to their ability to synthesize diverse images that convey complex visual
concepts. This generative power has more recently been leveraged to perform
text-to-3D synthesis. In this work, we present a technique that harnesses the
power of latent diffusion models for editing existing 3D objects. Our method
takes oriented 2D images of a 3D object as input and learns a grid-based
volumetric representation of it. To guide the volumetric representation to
conform to a target text prompt, we follow unconditional text-to-3D methods and
optimize a Score Distillation Sampling (SDS) loss. However, we observe that
combining this diffusion-guided loss with an image-based regularization loss
that encourages the representation not to deviate too strongly from the input
object is challenging, as it requires achieving two conflicting goals while
viewing only structure-and-appearance coupled 2D projections. Thus, we
introduce a novel volumetric regularization loss that operates directly in 3D
space, utilizing the explicit nature of our 3D representation to enforce
correlation between the global structure of the original and edited object.
Furthermore, we present a technique that optimizes cross-attention volumetric
grids to refine the spatial extent of the edits. Extensive experiments and
comparisons demonstrate the effectiveness of our approach in creating a myriad
of edits which cannot be achieved by prior works.
Related papers
- VCD-Texture: Variance Alignment based 3D-2D Co-Denoising for Text-Guided Texturing [22.39760469467524]
We propose a Variance texture synthesis to address the modal gap between the 2D and 3D diffusion models.
We present an inpainting module to improve details with conflicting regions.
arXiv Detail & Related papers (2024-07-05T12:11:33Z) - Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors [24.478875248825563]
We propose a novel image editing technique that enables 3D manipulations on single images.
Our method directly leverages powerful image diffusion models trained on a broad spectrum of text-image pairs.
Our method can generate high-quality 3D-aware image edits with large viewpoint transformations and high appearance and shape consistency with the input image.
arXiv Detail & Related papers (2024-03-18T06:18:59Z) - SERF: Fine-Grained Interactive 3D Segmentation and Editing with Radiance Fields [92.14328581392633]
We introduce a novel fine-grained interactive 3D segmentation and editing algorithm with radiance fields, which we refer to as SERF.
Our method entails creating a neural mesh representation by integrating multi-view algorithms with pre-trained 2D models.
Building upon this representation, we introduce a novel surface rendering technique that preserves local information and is robust to deformation.
arXiv Detail & Related papers (2023-12-26T02:50:42Z) - VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder [56.59814904526965]
This paper introduces a pioneering 3D encoder designed for text-to-3D generation.
A lightweight network is developed to efficiently acquire feature volumes from multi-view images.
The 3D volumes are then trained on a diffusion model for text-to-3D generation using a 3D U-Net.
arXiv Detail & Related papers (2023-12-18T18:59:05Z) - X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation [61.48050470095969]
X-Dreamer is a novel approach for high-quality text-to-3D content creation.
It bridges the gap between text-to-2D and text-to-3D synthesis.
arXiv Detail & Related papers (2023-11-30T07:23:00Z) - Directional Texture Editing for 3D Models [51.31499400557996]
ITEM3D is designed for automatic textbf3D object editing according to the text textbfInstructions.
Leveraging the diffusion models and the differentiable rendering, ITEM3D takes the rendered images as the bridge of text and 3D representation.
arXiv Detail & Related papers (2023-09-26T12:01:13Z) - IT3D: Improved Text-to-3D Generation with Explicit View Synthesis [71.68595192524843]
This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues.
Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images.
For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data.
arXiv Detail & Related papers (2023-08-22T14:39:17Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.