SyncNoise: Geometrically Consistent Noise Prediction for Text-based 3D   Scene Editing
        - URL: http://arxiv.org/abs/2406.17396v1
- Date: Tue, 25 Jun 2024 09:17:35 GMT
- Title: SyncNoise: Geometrically Consistent Noise Prediction for Text-based 3D   Scene Editing
- Authors: Ruihuang Li, Liyi Chen, Zhengqiang Zhang, Varun Jampani, Vishal M. Patel, Lei Zhang, 
- Abstract summary: We propose SyncNoise, a novel geometry-guided multi-view consistent noise editing approach for high-fidelity 3D scene editing.
 SyncNoise synchronously edits multiple views with 2D diffusion models while enforcing multi-view noise predictions to be geometrically consistent.
Our method achieves high-quality 3D editing results respecting the textual instructions, especially in scenes with complex textures.
- Score: 58.22339174221563
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Text-based 2D diffusion models have demonstrated impressive capabilities in image generation and editing. Meanwhile, the 2D diffusion models also exhibit substantial potentials for 3D editing tasks. However, how to achieve consistent edits across multiple viewpoints remains a challenge. While the iterative dataset update method is capable of achieving global consistency, it suffers from slow convergence and over-smoothed textures. We propose SyncNoise, a novel geometry-guided multi-view consistent noise editing approach for high-fidelity 3D scene editing. SyncNoise synchronously edits multiple views with 2D diffusion models while enforcing multi-view noise predictions to be geometrically consistent, which ensures global consistency in both semantic structure and low-frequency appearance. To further enhance local consistency in high-frequency details, we set a group of anchor views and propagate them to their neighboring frames through cross-view reprojection. To improve the reliability of multi-view correspondences, we introduce depth supervision during training to enhance the reconstruction of precise geometries. Our method achieves high-quality 3D editing results respecting the textual instructions, especially in scenes with complex textures, by enhancing geometric consistency at the noise and pixel levels. 
 
      
        Related papers
        - 3DSwapping: Texture Swapping For 3D Object From Single Reference Image [21.454340647455236]
 3D texture swapping allows for the customization of 3D object textures.
No dedicated method exists, but adapted 2D editing and text-driven 3D editing approaches can serve this purpose.
We introduce 3DSwapping, a 3D texture swapping method that integrates progressive generation, view-consistency gradient guidance, and prompt-tuned gradient guidance.
 arXiv  Detail & Related papers  (2025-03-24T16:31:52Z)
- Advancing 3D Gaussian Splatting Editing with Complementary and Consensus   Information [4.956066467858058]
 We present a novel framework for enhancing the visual fidelity and consistency of text-guided 3D Gaussian Splatting (3DGS) editing.
Our method demonstrates superior performance in rendering quality and view consistency compared to state-of-the-art approaches.
 arXiv  Detail & Related papers  (2025-03-14T17:15:26Z)
- CDI3D: Cross-guided Dense-view Interpolation for 3D Reconstruction [25.468907201804093]
 Large Reconstruction Models (LRMs) have shown great promise in leveraging multi-view images generated by 2D diffusion models to extract 3D content.
However, 2D diffusion models often struggle to produce dense images with strong multi-view consistency.
We present CDI3D, a feed-forward framework designed for efficient, high-quality image-to-3D generation with view.
 arXiv  Detail & Related papers  (2025-03-11T03:08:43Z)
- 3DEgo: 3D Editing on the Go! [6.072473323242202]
 We introduce 3DEgo to address a novel problem of directly synthesizing 3D scenes from monocular videos guided by textual prompts.
Our framework streamlines the conventional multi-stage 3D editing process into a single-stage workflow.
3DEgo demonstrates remarkable editing precision, speed, and adaptability across a variety of video sources.
 arXiv  Detail & Related papers  (2024-07-14T07:03:50Z)
- VCD-Texture: Variance Alignment based 3D-2D Co-Denoising for Text-Guided   Texturing [22.39760469467524]
 We propose a Variance texture synthesis to address the modal gap between the 2D and 3D diffusion models.
We present an inpainting module to improve details with conflicting regions.
 arXiv  Detail & Related papers  (2024-07-05T12:11:33Z)
- Generic 3D Diffusion Adapter Using Controlled Multi-View Editing [44.99706994361726]
 Open-domain 3D object synthesis has been lagging behind image synthesis due to limited data and higher computational complexity.
This paper proposes MVEdit, which functions as a 3D counterpart of SDEdit, employing ancestral sampling to jointly denoise multi-view images.
 MVEdit achieves 3D consistency through a training-free 3D Adapter, which lifts the 2D views of the last timestep into a coherent 3D representation.
 arXiv  Detail & Related papers  (2024-03-18T17:59:09Z)
- Consolidating Attention Features for Multi-view Image Editing [126.19731971010475]
 We focus on spatial control-based geometric manipulations and introduce a method to consolidate the editing process across various views.
We introduce QNeRF, a neural radiance field trained on the internal query features of the edited images.
We refine the process through a progressive, iterative method that better consolidates queries across the diffusion timesteps.
 arXiv  Detail & Related papers  (2024-02-22T18:50:18Z)
- FED-NeRF: Achieve High 3D Consistency and Temporal Coherence for Face
  Video Editing on Dynamic NeRF [77.94545888842883]
 This paper proposes a novel face video editing architecture built upon the dynamic face GAN-NeRF structure.
By editing the latent code, multi-view consistent editing on the face can be ensured, as validated by multiview stereo reconstruction.
We propose a stabilizer that maintains temporal coherence by preserving smooth changes of face expressions in consecutive frames.
 arXiv  Detail & Related papers  (2024-01-05T03:23:38Z)
- High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
 We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
 arXiv  Detail & Related papers  (2022-11-28T18:59:52Z)
- 3DDesigner: Towards Photorealistic 3D Object Generation and Editing with
  Text-guided Diffusion Models [71.25937799010407]
 We equip text-guided diffusion models to achieve 3D-consistent generation.
We study 3D local editing and propose a two-step solution.
We extend our model to perform one-shot novel view synthesis.
 arXiv  Detail & Related papers  (2022-11-25T13:50:00Z)
- StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image
  Synthesis [92.25145204543904]
 StyleNeRF is a 3D-aware generative model for high-resolution image synthesis with high multi-view consistency.
It integrates the neural radiance field (NeRF) into a style-based generator.
It can synthesize high-resolution images at interactive rates while preserving 3D consistency at high quality.
 arXiv  Detail & Related papers  (2021-10-18T02:37:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.