ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
- URL: http://arxiv.org/abs/2403.18818v1
- Date: Wed, 27 Mar 2024 17:59:52 GMT
- Title: ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
- Authors: Daniel Winter, Matan Cohen, Shlomi Fruchter, Yael Pritch, Alex Rav-Acha, Yedid Hoshen,
- Abstract summary: Diffusion models have revolutionized image editing but often generate images that violate physical laws.
We propose a practical solution centered on a qcounterfactual dataset.
By fine-tuning a diffusion model on this dataset, we are able to not only remove objects but also their effects on the scene.
- Score: 34.29147907526832
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have revolutionized image editing but often generate images that violate physical laws, particularly the effects of objects on the scene, e.g., occlusions, shadows, and reflections. By analyzing the limitations of self-supervised approaches, we propose a practical solution centered on a \q{counterfactual} dataset. Our method involves capturing a scene before and after removing a single object, while minimizing other changes. By fine-tuning a diffusion model on this dataset, we are able to not only remove objects but also their effects on the scene. However, we find that applying this approach for photorealistic object insertion requires an impractically large dataset. To tackle this challenge, we propose bootstrap supervision; leveraging our object removal model trained on a small counterfactual dataset, we synthetically expand this dataset considerably. Our approach significantly outperforms prior methods in photorealistic object removal and insertion, particularly at modeling the effects of objects on the scene.
Related papers
- OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting [54.525583840585305]
We introduce OmniPaint, a unified framework that re-conceptualizes object removal and insertion as interdependent processes.
Our novel CFD metric offers a robust, reference-free evaluation of context consistency and object hallucination.
arXiv Detail & Related papers (2025-03-11T17:55:27Z) - ObjectMover: Generative Object Movement with Video Prior [69.75281888309017]
We present ObjectMover, a generative model that can perform object movement in challenging scenes.
We show that with this approach, our model is able to adjust to complex real-world scenarios.
We propose a multi-task learning strategy that enables training on real-world video data to improve the model generalization.
arXiv Detail & Related papers (2025-03-11T04:42:59Z) - OmniEraser: Remove Objects and Their Effects in Images with Paired Video-Frame Data [21.469971783624402]
In this paper, we propose Video4Removal, a large-scale dataset comprising over 100,000 high-quality samples with realistic object shadows and reflections.
By constructing object-background pairs from video frames with off-the-shelf vision models, the labor costs of data acquisition can be significantly reduced.
To avoid generating shape-like artifacts and unintended content, we propose Object-Background Guidance.
We present OmniEraser, a novel method that seamlessly removes objects and their visual effects using only object masks as input.
arXiv Detail & Related papers (2025-01-13T15:12:40Z) - Generative Location Modeling for Spatially Aware Object Insertion [35.62317512925592]
Generative models have become a powerful tool for image editing tasks, including object insertion.
In this paper, we focus on creating a location model dedicated to identifying realistic object locations.
Specifically, we train an autoregressive model that generates bounding box coordinates, conditioned on the background image and the desired object class.
This formulation allows to effectively handle sparse placement annotations and to incorporate implausible locations into a preference dataset by performing direct preference optimization.
arXiv Detail & Related papers (2024-10-17T14:00:41Z) - EraseDraw: Learning to Insert Objects by Erasing Them from Images [24.55843674256795]
Prior works often fail by making global changes to the image, inserting objects in unrealistic spatial locations, and generating inaccurate lighting details.
We observe that while state-of-the-art models perform poorly on object insertion, they can remove objects and erase the background in natural images very well.
We show compelling results on diverse insertion prompts and images across various domains.
arXiv Detail & Related papers (2024-08-31T18:37:48Z) - DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - Retrieval Robust to Object Motion Blur [54.34823913494456]
We propose a method for object retrieval in images that are affected by motion blur.
We present the first large-scale datasets for blurred object retrieval.
Our method outperforms state-of-the-art retrieval methods on the new blur-retrieval datasets.
arXiv Detail & Related papers (2024-04-27T23:22:39Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Contemplating real-world object classification [53.10151901863263]
We reanalyze the ObjectNet dataset recently proposed by Barbu et al. containing objects in daily life situations.
We find that applying deep models to the isolated objects, rather than the entire scene as is done in the original paper, results in around 20-30% performance improvement.
arXiv Detail & Related papers (2021-03-08T23:29:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.