MagicEraser: Erasing Any Objects via Semantics-Aware Control
- URL: http://arxiv.org/abs/2410.10207v1
- Date: Mon, 14 Oct 2024 07:03:14 GMT
- Title: MagicEraser: Erasing Any Objects via Semantics-Aware Control
- Authors: Fan Li, Zixiao Zhang, Yi Huang, Jianzhuang Liu, Renjing Pei, Bin Shao, Songcen Xu,
- Abstract summary: We introduce MagicEraser, a diffusion model-based framework tailored for the object erasure task.
MagicEraser achieves fine and effective control of content generation while mitigating undesired artifacts.
- Score: 40.683569840182926
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The traditional image inpainting task aims to restore corrupted regions by referencing surrounding background and foreground. However, the object erasure task, which is in increasing demand, aims to erase objects and generate harmonious background. Previous GAN-based inpainting methods struggle with intricate texture generation. Emerging diffusion model-based algorithms, such as Stable Diffusion Inpainting, exhibit the capability to generate novel content, but they often produce incongruent results at the locations of the erased objects and require high-quality text prompt inputs. To address these challenges, we introduce MagicEraser, a diffusion model-based framework tailored for the object erasure task. It consists of two phases: content initialization and controllable generation. In the latter phase, we develop two plug-and-play modules called prompt tuning and semantics-aware attention refocus. Additionally, we propose a data construction strategy that generates training data specially suitable for this task. MagicEraser achieves fine and effective control of content generation while mitigating undesired artifacts. Experimental results highlight a valuable advancement of our approach in the object erasure task.
Related papers
- OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting [54.525583840585305]
We introduce OmniPaint, a unified framework that re-conceptualizes object removal and insertion as interdependent processes.
Our novel CFD metric offers a robust, reference-free evaluation of context consistency and object hallucination.
arXiv Detail & Related papers (2025-03-11T17:55:27Z) - OmniEraser: Remove Objects and Their Effects in Images with Paired Video-Frame Data [21.469971783624402]
In this paper, we propose Video4Removal, a large-scale dataset comprising over 100,000 high-quality samples with realistic object shadows and reflections.
By constructing object-background pairs from video frames with off-the-shelf vision models, the labor costs of data acquisition can be significantly reduced.
To avoid generating shape-like artifacts and unintended content, we propose Object-Background Guidance.
We present OmniEraser, a novel method that seamlessly removes objects and their visual effects using only object masks as input.
arXiv Detail & Related papers (2025-01-13T15:12:40Z) - MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models [51.1034358143232]
We introduce component-controllable personalization, a novel task that pushes the boundaries of text-to-image (T2I) models.
To overcome these challenges, we design MagicTailor, an innovative framework that leverages Dynamic Masked Degradation (DM-Deg) to dynamically perturb undesired visual semantics.
arXiv Detail & Related papers (2024-10-17T09:22:53Z) - Improving Text-guided Object Inpainting with Semantic Pre-inpainting [95.17396565347936]
We decompose the typical single-stage object inpainting into two cascaded processes: semantic pre-inpainting and high-fieldity object generation.
To achieve this, we cascade a Transformer-based semantic inpainter and an object inpainting diffusion model, leading to a novel CAscaded Transformer-Diffusion framework.
arXiv Detail & Related papers (2024-09-12T17:55:37Z) - DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - Salient Object-Aware Background Generation using Text-Guided Diffusion Models [4.747826159446815]
We present a model for adapting inpainting diffusion models to the salient object outpainting task using Stable Diffusion and ControlNet architectures.
Our proposed approach reduces object expansion by 3.6x on average with no degradation in standard visual metrics across multiple datasets.
arXiv Detail & Related papers (2024-04-15T22:13:35Z) - Foreground-Background Separation through Concept Distillation from
Generative Image Foundation Models [6.408114351192012]
We present a novel method that enables the generation of general foreground-background segmentation models from simple textual descriptions.
We show results on the task of segmenting four different objects (humans, dogs, cars, birds) and a use case scenario in medical image analysis.
arXiv Detail & Related papers (2022-12-29T13:51:54Z) - ObjectStitch: Generative Object Compositing [43.206123360578665]
We propose a self-supervised framework for object compositing using conditional diffusion models.
Our framework can transform the viewpoint, geometry, color and shadow of the generated object while requiring no manual labeling.
Our method outperforms relevant baselines in both realism and faithfulness of the synthesized result images in a user study on various real-world images.
arXiv Detail & Related papers (2022-12-02T02:15:13Z) - Context-Aware Layout to Image Generation with Enhanced Object Appearance [123.62597976732948]
A layout to image (L2I) generation model aims to generate a complicated image containing multiple objects (things) against natural background (stuff)
Existing L2I models have made great progress, but object-to-object and object-to-stuff relations are often broken.
We argue that these are caused by the lack of context-aware object and stuff feature encoding in their generators, and location-sensitive appearance representation in their discriminators.
arXiv Detail & Related papers (2021-03-22T14:43:25Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.