GeoDiffuser: Geometry-Based Image Editing with Diffusion Models
- URL: http://arxiv.org/abs/2404.14403v1
- Date: Mon, 22 Apr 2024 17:58:36 GMT
- Title: GeoDiffuser: Geometry-Based Image Editing with Diffusion Models
- Authors: Rahul Sajnani, Jeroen Vanbaar, Jie Min, Kapil Katyal, Srinath Sridhar,
- Abstract summary: We present GeoDiffuser, a zero-shot optimization-based method that unifies 2D and 3D image-based object editing capabilities into a single method.
We show that these transformations can be directly incorporated into the attention layers in diffusion models to implicitly perform editing operations.
GeoDiffuser can perform common 2D and 3D edits like object translation, 3D rotation, and removal.
- Score: 7.7669649283012
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The success of image generative models has enabled us to build methods that can edit images based on text or other user input. However, these methods are bespoke, imprecise, require additional information, or are limited to only 2D image edits. We present GeoDiffuser, a zero-shot optimization-based method that unifies common 2D and 3D image-based object editing capabilities into a single method. Our key insight is to view image editing operations as geometric transformations. We show that these transformations can be directly incorporated into the attention layers in diffusion models to implicitly perform editing operations. Our training-free optimization method uses an objective function that seeks to preserve object style but generate plausible images, for instance with accurate lighting and shadows. It also inpaints disoccluded parts of the image where the object was originally located. Given a natural image and user input, we segment the foreground object using SAM and estimate a corresponding transform which is used by our optimization approach for editing. GeoDiffuser can perform common 2D and 3D edits like object translation, 3D rotation, and removal. We present quantitative results, including a perceptual study, that shows how our approach is better than existing methods. Visit https://ivl.cs.brown.edu/research/geodiffuser.html for more information.
Related papers
- 3D Gaussian Editing with A Single Image [19.662680524312027]
We introduce a novel single-image-driven 3D scene editing approach based on 3D Gaussian Splatting.
Our method learns to optimize the 3D Gaussians to align with an edited version of the image rendered from a user-specified viewpoint.
Experiments show the effectiveness of our method in handling geometric details, long-range, and non-rigid deformation.
arXiv Detail & Related papers (2024-08-14T13:17:42Z) - ICE-G: Image Conditional Editing of 3D Gaussian Splats [45.112689255145625]
We introduce a novel approach to quickly edit a 3D model from a single reference view.
Our technique first segments the edit image, and then matches semantically corresponding regions across chosen segmented dataset views.
A color or texture change from a particular region of the edit image can then be applied to other views automatically in a semantically sensible manner.
arXiv Detail & Related papers (2024-06-12T17:59:52Z) - DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing [72.54566271694654]
We consider the problem of editing 3D objects and scenes based on open-ended language instructions.
A common approach to this problem is to use a 2D image generator or editor to guide the 3D editing process.
This process is often inefficient due to the need for iterative updates of costly 3D representations.
arXiv Detail & Related papers (2024-04-29T17:59:30Z) - Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors [24.478875248825563]
We propose a novel image editing technique that enables 3D manipulations on single images.
Our method directly leverages powerful image diffusion models trained on a broad spectrum of text-image pairs.
Our method can generate high-quality 3D-aware image edits with large viewpoint transformations and high appearance and shape consistency with the input image.
arXiv Detail & Related papers (2024-03-18T06:18:59Z) - GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing [38.948892064761914]
GaussCtrl is a text-driven method to edit a 3D scene reconstructed by the 3D Gaussian Splatting (3DGS)
Our key contribution is multi-view consistent editing, which enables editing all images together instead of iteratively editing one image.
arXiv Detail & Related papers (2024-03-13T17:35:28Z) - Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape
Laplacian [58.704089101826774]
We present a 3D-aware image deformation method with minimal restrictions on shape category and deformation type.
We take a supervised learning-based approach to predict the shape Laplacian of the underlying volume of a 3D reconstruction represented as a point cloud.
In the experiments, we present our results of deforming 2D character and clothed human images.
arXiv Detail & Related papers (2022-03-29T04:57:18Z) - Learning Canonical 3D Object Representation for Fine-Grained Recognition [77.33501114409036]
We propose a novel framework for fine-grained object recognition that learns to recover object variation in 3D space from a single image.
We represent an object as a composition of 3D shape and its appearance, while eliminating the effect of camera viewpoint.
By incorporating 3D shape and appearance jointly in a deep representation, our method learns the discriminative representation of the object.
arXiv Detail & Related papers (2021-08-10T12:19:34Z) - ShaRF: Shape-conditioned Radiance Fields from a Single View [54.39347002226309]
We present a method for estimating neural scenes representations of objects given only a single image.
The core of our method is the estimation of a geometric scaffold for the object.
We demonstrate in several experiments the effectiveness of our approach in both synthetic and real images.
arXiv Detail & Related papers (2021-02-17T16:40:28Z) - AutoSweep: Recovering 3D Editable Objectsfrom a Single Photograph [54.701098964773756]
We aim to recover 3D objects with semantic parts and can be directly edited.
Our work makes an attempt towards recovering two types of primitive-shaped objects, namely, generalized cuboids and generalized cylinders.
Our algorithm can recover high quality 3D models and outperforms existing methods in both instance segmentation and 3D reconstruction.
arXiv Detail & Related papers (2020-05-27T12:16:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.