RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting
- URL: http://arxiv.org/abs/2404.10765v1
- Date: Tue, 16 Apr 2024 17:50:02 GMT
- Title: RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting
- Authors: Ashkan Mirzaei, Riccardo De Lutio, Seung Wook Kim, David Acuna, Jonathan Kelly, Sanja Fidler, Igor Gilitschenski, Zan Gojcic,
- Abstract summary: RefFusion is a novel 3D inpainting method based on a multi-scale personalization of an image inpainting diffusion model to the given reference view.
Our framework achieves state-of-the-art results for object removal while maintaining high controllability.
- Score: 63.567363455092234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural reconstruction approaches are rapidly emerging as the preferred representation for 3D scenes, but their limited editability is still posing a challenge. In this work, we propose an approach for 3D scene inpainting -- the task of coherently replacing parts of the reconstructed scene with desired content. Scene inpainting is an inherently ill-posed task as there exist many solutions that plausibly replace the missing content. A good inpainting method should therefore not only enable high-quality synthesis but also a high degree of control. Based on this observation, we focus on enabling explicit control over the inpainted content and leverage a reference image as an efficient means to achieve this goal. Specifically, we introduce RefFusion, a novel 3D inpainting method based on a multi-scale personalization of an image inpainting diffusion model to the given reference view. The personalization effectively adapts the prior distribution to the target scene, resulting in a lower variance of score distillation objective and hence significantly sharper details. Our framework achieves state-of-the-art results for object removal while maintaining high controllability. We further demonstrate the generality of our formulation on other downstream tasks such as object insertion, scene outpainting, and sparse view reconstruction.
Related papers
- Improving Text-guided Object Inpainting with Semantic Pre-inpainting [95.17396565347936]
We decompose the typical single-stage object inpainting into two cascaded processes: semantic pre-inpainting and high-fieldity object generation.
To achieve this, we cascade a Transformer-based semantic inpainter and an object inpainting diffusion model, leading to a novel CAscaded Transformer-Diffusion framework.
arXiv Detail & Related papers (2024-09-12T17:55:37Z) - Localized Gaussian Splatting Editing with Contextual Awareness [10.46087834880747]
We introduce an illumination-aware 3D scene editing pipeline for 3D Gaussian Splatting (3DGS) representation.
Inpainting by the state-of-the-art conditional 2D diffusion model is consistent with background in lighting.
Our approach efficiently achieves local editing with global illumination consistency without explicitly modeling light transport.
arXiv Detail & Related papers (2024-07-31T18:00:45Z) - DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - Salient Object-Aware Background Generation using Text-Guided Diffusion Models [4.747826159446815]
We present a model for adapting inpainting diffusion models to the salient object outpainting task using Stable Diffusion and ControlNet architectures.
Our proposed approach reduces object expansion by 3.6x on average with no degradation in standard visual metrics across multiple datasets.
arXiv Detail & Related papers (2024-04-15T22:13:35Z) - PaletteNeRF: Palette-based Appearance Editing of Neural Radiance Fields [60.66412075837952]
We present PaletteNeRF, a novel method for appearance editing of neural radiance fields (NeRF) based on 3D color decomposition.
Our method decomposes the appearance of each 3D point into a linear combination of palette-based bases.
We extend our framework with compressed semantic features for semantic-aware appearance editing.
arXiv Detail & Related papers (2022-12-21T00:20:01Z) - SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural
Radiance Fields [26.296017756560467]
In 3D, solutions must be consistent across multiple views and geometrically valid.
We propose a novel 3D inpainting method that addresses these challenges.
We first demonstrate the superiority of our approach on multiview segmentation, comparing to NeRFbased methods and 2D segmentation approaches.
arXiv Detail & Related papers (2022-11-22T13:14:50Z) - Perceptual Artifacts Localization for Inpainting [60.5659086595901]
We propose a new learning task of automatic segmentation of inpainting perceptual artifacts.
We train advanced segmentation networks on a dataset to reliably localize inpainting artifacts within inpainted images.
We also propose a new evaluation metric called Perceptual Artifact Ratio (PAR), which is the ratio of objectionable inpainted regions to the entire inpainted area.
arXiv Detail & Related papers (2022-08-05T18:50:51Z) - Holistic 3D Scene Understanding from a Single Image with Implicit
Representation [112.40630836979273]
We present a new pipeline for holistic 3D scene understanding from a single image.
We propose an image-based local structured implicit network to improve the object shape estimation.
We also refine 3D object pose and scene layout via a novel implicit scene graph neural network.
arXiv Detail & Related papers (2021-03-11T02:52:46Z) - Semantic-Guided Inpainting Network for Complex Urban Scenes Manipulation [19.657440527538547]
In this work, we propose a novel deep learning model to alter a complex urban scene by removing a user-specified portion of the image.
Inspired by recent works on image inpainting, our proposed method leverages the semantic segmentation to model the content and structure of the image.
To generate reliable results, we design a new decoder block that combines the semantic segmentation and generation task.
arXiv Detail & Related papers (2020-10-19T09:17:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.