Diffusion-Based Attention Warping for Consistent 3D Scene Editing
- URL: http://arxiv.org/abs/2412.07984v1
- Date: Tue, 10 Dec 2024 23:57:18 GMT
- Title: Diffusion-Based Attention Warping for Consistent 3D Scene Editing
- Authors: Eyal Gomel, Lior Wolf,
- Abstract summary: We present a novel method for 3D scene editing using diffusion models.
Our approach leverages attention features extracted from a single reference image to define the intended edits.
Injecting these warped features into other viewpoints enables coherent propagation of edits.
- Score: 55.2480439325792
- License:
- Abstract: We present a novel method for 3D scene editing using diffusion models, designed to ensure view consistency and realism across perspectives. Our approach leverages attention features extracted from a single reference image to define the intended edits. These features are warped across multiple views by aligning them with scene geometry derived from Gaussian splatting depth estimates. Injecting these warped features into other viewpoints enables coherent propagation of edits, achieving high fidelity and spatial alignment in 3D space. Extensive evaluations demonstrate the effectiveness of our method in generating versatile edits of 3D scenes, significantly advancing the capabilities of scene manipulation compared to the existing methods. Project page: \url{https://attention-warp.github.io}
Related papers
- Layout2Scene: 3D Semantic Layout Guided Scene Generation via Geometry and Appearance Diffusion Priors [52.63385546943866]
We present a text-to-scene generation method (namely, Layout2Scene) using additional semantic layout as the prompt to inject precise control of 3D object positions.
To fully leverage 2D diffusion priors in geometry and appearance generation, we introduce a semantic-guided geometry diffusion model and a semantic-geometry guided diffusion model.
Our method can generate more plausible and realistic scenes as compared to state-of-the-art approaches.
arXiv Detail & Related papers (2025-01-05T12:20:13Z) - SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation [44.354071773885735]
SceneFactor is a diffusion-based approach for large-scale 3D scene generation.
It enables controllable generation and effortless editing.
Our approach enables high-fidelity 3D scene synthesis with effective controllable editing.
arXiv Detail & Related papers (2024-12-02T18:47:41Z) - Revealing Directions for Text-guided 3D Face Editing [52.85632020601518]
3D face editing is a significant task in multimedia, aimed at the manipulation of 3D face models across various control signals.
We present Face Clan, a text-general approach for generating and manipulating 3D faces based on arbitrary attribute descriptions.
Our method offers a precisely controllable manipulation method, allowing users to intuitively customize regions of interest with the text description.
arXiv Detail & Related papers (2024-10-07T12:04:39Z) - 3D Gaussian Editing with A Single Image [19.662680524312027]
We introduce a novel single-image-driven 3D scene editing approach based on 3D Gaussian Splatting.
Our method learns to optimize the 3D Gaussians to align with an edited version of the image rendered from a user-specified viewpoint.
Experiments show the effectiveness of our method in handling geometric details, long-range, and non-rigid deformation.
arXiv Detail & Related papers (2024-08-14T13:17:42Z) - Localized Gaussian Splatting Editing with Contextual Awareness [10.46087834880747]
We introduce an illumination-aware 3D scene editing pipeline for 3D Gaussian Splatting (3DGS) representation.
Inpainting by the state-of-the-art conditional 2D diffusion model is consistent with background in lighting.
Our approach efficiently achieves local editing with global illumination consistency without explicitly modeling light transport.
arXiv Detail & Related papers (2024-07-31T18:00:45Z) - SyncNoise: Geometrically Consistent Noise Prediction for Text-based 3D Scene Editing [58.22339174221563]
We propose SyncNoise, a novel geometry-guided multi-view consistent noise editing approach for high-fidelity 3D scene editing.
SyncNoise synchronously edits multiple views with 2D diffusion models while enforcing multi-view noise predictions to be geometrically consistent.
Our method achieves high-quality 3D editing results respecting the textual instructions, especially in scenes with complex textures.
arXiv Detail & Related papers (2024-06-25T09:17:35Z) - 3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting [100.94916668527544]
Existing methods solely focus on either 2D individual object or 3D global scene editing.
We propose 3DitScene, a novel and unified scene editing framework.
It enables seamless editing from 2D to 3D, allowing precise control over scene composition and individual objects.
arXiv Detail & Related papers (2024-05-28T17:59:01Z) - Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors [24.478875248825563]
We propose a novel image editing technique that enables 3D manipulations on single images.
Our method directly leverages powerful image diffusion models trained on a broad spectrum of text-image pairs.
Our method can generate high-quality 3D-aware image edits with large viewpoint transformations and high appearance and shape consistency with the input image.
arXiv Detail & Related papers (2024-03-18T06:18:59Z) - Compositional 3D Scene Generation using Locally Conditioned Diffusion [49.5784841881488]
We introduce textbflocally conditioned diffusion as an approach to compositional scene diffusion.
We demonstrate a score distillation sampling--based text-to-3D synthesis pipeline that enables compositional 3D scene generation at a higher fidelity than relevant baselines.
arXiv Detail & Related papers (2023-03-21T22:37:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.