OmniEraser: Remove Objects and Their Effects in Images with Paired Video-Frame Data
- URL: http://arxiv.org/abs/2501.07397v3
- Date: Tue, 11 Mar 2025 14:04:38 GMT
- Title: OmniEraser: Remove Objects and Their Effects in Images with Paired Video-Frame Data
- Authors: Runpu Wei, Zijin Yin, Shuo Zhang, Lanxiang Zhou, Xueyi Wang, Chao Ban, Tianwei Cao, Hao Sun, Zhongjiang He, Kongming Liang, Zhanyu Ma,
- Abstract summary: In this paper, we propose Video4Removal, a large-scale dataset comprising over 100,000 high-quality samples with realistic object shadows and reflections.<n>By constructing object-background pairs from video frames with off-the-shelf vision models, the labor costs of data acquisition can be significantly reduced.<n>To avoid generating shape-like artifacts and unintended content, we propose Object-Background Guidance.<n>We present OmniEraser, a novel method that seamlessly removes objects and their visual effects using only object masks as input.
- Score: 21.469971783624402
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inpainting algorithms have achieved remarkable progress in removing objects from images, yet still face two challenges: 1) struggle to handle the object's visual effects such as shadow and reflection; 2) easily generate shape-like artifacts and unintended content. In this paper, we propose Video4Removal, a large-scale dataset comprising over 100,000 high-quality samples with realistic object shadows and reflections. By constructing object-background pairs from video frames with off-the-shelf vision models, the labor costs of data acquisition can be significantly reduced. To avoid generating shape-like artifacts and unintended content, we propose Object-Background Guidance, an elaborated paradigm that takes both the foreground object and background images. It can guide the diffusion process to harness richer contextual information. Based on the above two designs, we present OmniEraser, a novel method that seamlessly removes objects and their visual effects using only object masks as input. Extensive experiments show that OmniEraser significantly outperforms previous methods, particularly in complex in-the-wild scenes. And it also exhibits a strong generalization ability in anime-style images. Datasets, models, and codes will be published.
Related papers
- OmnimatteZero: Training-free Real-time Omnimatte with Pre-trained Video Diffusion Models [31.48981364573974]
We present OmnimatteZero, a training-free approach that leverages off-the-shelf pre-trained video diffusion models for omnimatte.
It can remove objects from videos, extract individual object layers along with their effects, and composite those objects onto new videos.
We show that self-attention maps capture information about the object and its footprints and use them to inpaint the object's effects, leaving a clean background.
arXiv Detail & Related papers (2025-03-23T11:26:48Z) - Generative Image Layer Decomposition with Visual Effects [49.75021036203426]
LayerDecomp is a generative framework for image layer decomposition.
It produces clean backgrounds and high-quality transparent foregrounds with faithfully preserved visual effects.
Our method achieves superior quality in layer decomposition, outperforming existing approaches in object removal and spatial editing tasks.
arXiv Detail & Related papers (2024-11-26T20:26:49Z) - Generative Omnimatte: Learning to Decompose Video into Layers [29.098471541412113]
We present a novel generative layered video decomposition framework to address the omnimatte problem.
Our core idea is to train a video diffusion model to identify and remove scene effects caused by a specific object.
We show that this model can be finetuned from an existing video inpainting model with a small, carefully curated dataset.
arXiv Detail & Related papers (2024-11-25T18:59:57Z) - MagicEraser: Erasing Any Objects via Semantics-Aware Control [40.683569840182926]
We introduce MagicEraser, a diffusion model-based framework tailored for the object erasure task.
MagicEraser achieves fine and effective control of content generation while mitigating undesired artifacts.
arXiv Detail & Related papers (2024-10-14T07:03:14Z) - EraseDraw: Learning to Draw Step-by-Step via Erasing Objects from Images [24.55843674256795]
Prior works often fail by making global changes to the image, inserting objects in unrealistic spatial locations, and generating inaccurate lighting details.<n>We observe that while state-of-the-art models perform poorly on object insertion, they can remove objects and erase the background in natural images very well.<n>We show compelling results on diverse insertion prompts and images across various domains.
arXiv Detail & Related papers (2024-08-31T18:37:48Z) - DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - Retrieval Robust to Object Motion Blur [54.34823913494456]
We propose a method for object retrieval in images that are affected by motion blur.
We present the first large-scale datasets for blurred object retrieval.
Our method outperforms state-of-the-art retrieval methods on the new blur-retrieval datasets.
arXiv Detail & Related papers (2024-04-27T23:22:39Z) - ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion [34.29147907526832]
Diffusion models have revolutionized image editing but often generate images that violate physical laws.
We propose a practical solution centered on a qcounterfactual dataset.
By fine-tuning a diffusion model on this dataset, we are able to not only remove objects but also their effects on the scene.
arXiv Detail & Related papers (2024-03-27T17:59:52Z) - OR-NeRF: Object Removing from 3D Scenes Guided by Multiview Segmentation
with Neural Radiance Fields [53.32527220134249]
The emergence of Neural Radiance Fields (NeRF) for novel view synthesis has increased interest in 3D scene editing.
Current methods face challenges such as time-consuming object labeling, limited capability to remove specific targets, and compromised rendering quality after removal.
This paper proposes a novel object-removing pipeline, named OR-NeRF, that can remove objects from 3D scenes with user-given points or text prompts on a single view.
arXiv Detail & Related papers (2023-05-17T18:18:05Z) - DisCoScene: Spatially Disentangled Generative Radiance Fields for
Controllable 3D-aware Scene Synthesis [90.32352050266104]
DisCoScene is a 3Daware generative model for high-quality and controllable scene synthesis.
It disentangles the whole scene into object-centric generative fields by learning on only 2D images with the global-local discrimination.
We demonstrate state-of-the-art performance on many scene datasets, including the challenging outdoor dataset.
arXiv Detail & Related papers (2022-12-22T18:59:59Z) - Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators [97.12135238534628]
We propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects.
Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts.
Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks.
arXiv Detail & Related papers (2022-12-13T01:36:56Z) - Occlusion-Aware Video Object Inpainting [72.38919601150175]
This paper presents occlusion-aware video object inpainting, which recovers both the complete shape and appearance for occluded objects in videos.
Our technical contribution VOIN jointly performs video object shape completion and occluded texture generation.
For more realistic results, VOIN is optimized using both T-PatchGAN and a newoc-temporal YouTube attention-based multi-class discriminator.
arXiv Detail & Related papers (2021-08-15T15:46:57Z) - Omnimatte: Associating Objects and Their Effects in Video [100.66205249649131]
Scene effects related to objects in video are typically overlooked by computer vision.
In this work, we take a step towards solving this novel problem of automatically associating objects with their effects in video.
Our model is trained only on the input video in a self-supervised manner, without any manual labels, and is generic---it produces omnimattes automatically for arbitrary objects and a variety of effects.
arXiv Detail & Related papers (2021-05-14T17:57:08Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - REGRAD: A Large-Scale Relational Grasp Dataset for Safe and
Object-Specific Robotic Grasping in Clutter [52.117388513480435]
We present a new dataset named regrad to sustain the modeling of relationships among objects and grasps.
Our dataset is collected in both forms of 2D images and 3D point clouds.
Users are free to import their own object models for the generation of as many data as they want.
arXiv Detail & Related papers (2021-04-29T05:31:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.