OBJECT 3DIT: Language-guided 3D-aware Image Editing
- URL: http://arxiv.org/abs/2307.11073v1
- Date: Thu, 20 Jul 2023 17:53:46 GMT
- Title: OBJECT 3DIT: Language-guided 3D-aware Image Editing
- Authors: Oscar Michel, Anand Bhattad, Eli VanderBilt, Ranjay Krishna, Aniruddha
Kembhavi, Tanmay Gupta
- Abstract summary: Existing image editing tools disregard the underlying 3D geometry from which the image is projected.
We formulate the newt ask of language-guided 3D-aware editing, where objects in an image should be edited according to a language instruction in context of the underlying 3D scene.
We release OBJECT: a dataset consisting of 400K editing examples created from procedurally generated 3D scenes.
Our models show impressive abilities to understand the 3D composition of entire scenes, factoring in surrounding objects, surfaces, lighting conditions, shadows, and physically-plausible object configurations.
- Score: 27.696507467754877
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing image editing tools, while powerful, typically disregard the
underlying 3D geometry from which the image is projected. As a result, edits
made using these tools may become detached from the geometry and lighting
conditions that are at the foundation of the image formation process. In this
work, we formulate the newt ask of language-guided 3D-aware editing, where
objects in an image should be edited according to a language instruction in
context of the underlying 3D scene. To promote progress towards this goal, we
release OBJECT: a dataset consisting of 400K editing examples created from
procedurally generated 3D scenes. Each example consists of an input image,
editing instruction in language, and the edited image. We also introduce 3DIT :
single and multi-task models for four editing tasks. Our models show impressive
abilities to understand the 3D composition of entire scenes, factoring in
surrounding objects, surfaces, lighting conditions, shadows, and
physically-plausible object configurations. Surprisingly, training on only
synthetic scenes from OBJECT, editing capabilities of 3DIT generalize to
real-world images.
Related papers
- Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts [76.73043724587679]
We propose a dialogue-based 3D scene editing approach, termed CE3D.
Hash-Atlas represents 3D scene views, which transfers the editing of 3D scenes onto 2D atlas images.
Results demonstrate that CE3D effectively integrates multiple visual models to achieve diverse editing visual effects.
arXiv Detail & Related papers (2024-07-09T13:24:42Z) - 3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting [100.94916668527544]
Existing methods solely focus on either 2D individual object or 3D global scene editing.
We propose 3DitScene, a novel and unified scene editing framework.
It enables seamless editing from 2D to 3D, allowing precise control over scene composition and individual objects.
arXiv Detail & Related papers (2024-05-28T17:59:01Z) - Reference-Based 3D-Aware Image Editing with Triplanes [15.222454412573455]
Generative Adversarial Networks (GANs) have emerged as powerful tools for high-quality image generation and real image editing by manipulating their latent spaces.
Recent advancements in GANs include 3D-aware models such as EG3D, which feature efficient triplane-based architectures capable of reconstructing 3D geometry from single images.
This study addresses this gap by exploring and demonstrating the effectiveness of the triplane space for advanced reference-based edits.
arXiv Detail & Related papers (2024-04-04T17:53:33Z) - Disentangled 3D Scene Generation with Layout Learning [109.03233745767062]
We introduce a method to generate 3D scenes that are disentangled into their component objects.
Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene.
We show that despite its simplicity, our approach successfully generates 3D scenes into individual objects.
arXiv Detail & Related papers (2024-02-26T18:54:15Z) - Image Sculpting: Precise Object Editing with 3D Geometry Control [33.9777412846583]
Image Sculpting is a new framework for editing 2D images by incorporating tools from 3D geometry and graphics.
It supports precise, quantifiable, and physically-plausible editing options such as pose editing, rotation, translation, 3D composition, carving, and serial addition.
arXiv Detail & Related papers (2024-01-02T18:59:35Z) - Editing 3D Scenes via Text Prompts without Retraining [80.57814031701744]
DN2N is a text-driven editing method that allows for the direct acquisition of a NeRF model with universal editing capabilities.
Our method employs off-the-shelf text-based editing models of 2D images to modify the 3D scene images.
Our method achieves multiple editing types, including but not limited to appearance editing, weather transition, material changing, and style transfer.
arXiv Detail & Related papers (2023-09-10T02:31:50Z) - SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing
Field [37.8162035179377]
We present a novel semantic-driven NeRF editing approach, which enables users to edit a neural radiance field with a single image.
To achieve this goal, we propose a prior-guided editing field to encode fine-grained geometric and texture editing in 3D space.
Our method achieves photo-realistic 3D editing using only a single edited image, pushing the bound of semantic-driven editing in 3D real-world scenes.
arXiv Detail & Related papers (2023-03-23T13:58:11Z) - CC3D: Layout-Conditioned Generation of Compositional 3D Scenes [49.281006972028194]
We introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts.
Our evaluations on synthetic 3D-FRONT and real-world KITTI-360 datasets demonstrate that our model generates scenes of improved visual and geometric quality.
arXiv Detail & Related papers (2023-03-21T17:59:02Z) - gCoRF: Generative Compositional Radiance Fields [80.45269080324677]
3D generative models of objects enable photorealistic image synthesis with 3D control.
Existing methods model the scene as a global scene representation, ignoring the compositional aspect of the scene.
We present a compositional generative model, where each semantic part of the object is represented as an independent 3D representation.
arXiv Detail & Related papers (2022-10-31T14:10:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.