VDOR: A Video-based Dataset for Object Removal via Sequence Consistency
- URL: http://arxiv.org/abs/2501.07397v2
- Date: Fri, 31 Jan 2025 06:41:24 GMT
- Title: VDOR: A Video-based Dataset for Object Removal via Sequence Consistency
- Authors: Runpu Wei, Shuo Zhang, Zhonghao Yan, Zijin Yin, Xueyi Wang, Kongming Liang, Zhanyu Ma,
- Abstract summary: Existing datasets related to object removal serve a valuable foundation for model validation and optimization.<n>We propose a novel video-based annotation pipeline for constructing a realistic illumination-aware object removal dataset.<n>By leveraging continuous real-world video frames, we minimize distribution gaps and accurately capture realistic lighting and shadow variations.
- Score: 19.05827956984347
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object removal, as a sub-task of image inpainting, has garnered significant attention in recent years. Existing datasets related to object removal serve a valuable foundation for model validation and optimization. However, they mainly rely on inpainting techniques to generate pseudo-removed results, leading to distribution gaps between synthetic and real-world data. While some real-world datasets mitigate these issues, they face challenges such as limited scalability, high annotation costs, and unrealistic representations of lighting and shadows. To address these limitations, we propose a novel video-based annotation pipeline for constructing a realistic illumination-aware object removal dataset. Leveraging this pipeline, we introduce VDOR, a dataset specifically designed for object removal tasks, which comprises triplets of original frame images with objects, background images without objects, and corresponding masks. By leveraging continuous real-world video frames, we minimize distribution gaps and accurately capture realistic lighting and shadow variations, ensuring close alignment with real-world scenarios. Our approach significantly reduces annotation effort while providing a robust foundation for advancing object removal research.
Related papers
- ObjectClear: Complete Object Removal via Object-Effect Attention [56.2893552300215]
We introduce a new dataset for OBject-Effect Removal, named OBER, which provides paired images with and without object effects, along with precise masks for both objects and their associated visual artifacts.<n>We propose a novel framework, ObjectClear, which incorporates an object-effect attention mechanism to guide the model toward the foreground removal regions by learning attention masks.<n>Experiments demonstrate that ObjectClear outperforms existing methods, achieving improved object-effect removal quality and background fidelity, especially in complex scenarios.
arXiv Detail & Related papers (2025-05-28T17:51:17Z) - OmnimatteZero: Training-free Real-time Omnimatte with Pre-trained Video Diffusion Models [31.48981364573974]
We present OmnimatteZero, a training-free approach that leverages off-the-shelf pre-trained video diffusion models for omnimatte.
It can remove objects from videos, extract individual object layers along with their effects, and composite those objects onto new videos.
We show that self-attention maps capture information about the object and its footprints and use them to inpaint the object's effects, leaving a clean background.
arXiv Detail & Related papers (2025-03-23T11:26:48Z) - Generative Image Layer Decomposition with Visual Effects [49.75021036203426]
LayerDecomp is a generative framework for image layer decomposition.
It produces clean backgrounds and high-quality transparent foregrounds with faithfully preserved visual effects.
Our method achieves superior quality in layer decomposition, outperforming existing approaches in object removal and spatial editing tasks.
arXiv Detail & Related papers (2024-11-26T20:26:49Z) - Generative Omnimatte: Learning to Decompose Video into Layers [29.098471541412113]
We present a novel generative layered video decomposition framework to address the omnimatte problem.
Our core idea is to train a video diffusion model to identify and remove scene effects caused by a specific object.
We show that this model can be finetuned from an existing video inpainting model with a small, carefully curated dataset.
arXiv Detail & Related papers (2024-11-25T18:59:57Z) - MagicEraser: Erasing Any Objects via Semantics-Aware Control [40.683569840182926]
We introduce MagicEraser, a diffusion model-based framework tailored for the object erasure task.
MagicEraser achieves fine and effective control of content generation while mitigating undesired artifacts.
arXiv Detail & Related papers (2024-10-14T07:03:14Z) - EraseDraw: Learning to Draw Step-by-Step via Erasing Objects from Images [24.55843674256795]
Prior works often fail by making global changes to the image, inserting objects in unrealistic spatial locations, and generating inaccurate lighting details.<n>We observe that while state-of-the-art models perform poorly on object insertion, they can remove objects and erase the background in natural images very well.<n>We show compelling results on diverse insertion prompts and images across various domains.
arXiv Detail & Related papers (2024-08-31T18:37:48Z) - DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - Retrieval Robust to Object Motion Blur [54.34823913494456]
We propose a method for object retrieval in images that are affected by motion blur.
We present the first large-scale datasets for blurred object retrieval.
Our method outperforms state-of-the-art retrieval methods on the new blur-retrieval datasets.
arXiv Detail & Related papers (2024-04-27T23:22:39Z) - ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion [34.29147907526832]
Diffusion models have revolutionized image editing but often generate images that violate physical laws.
We propose a practical solution centered on a qcounterfactual dataset.
By fine-tuning a diffusion model on this dataset, we are able to not only remove objects but also their effects on the scene.
arXiv Detail & Related papers (2024-03-27T17:59:52Z) - OR-NeRF: Object Removing from 3D Scenes Guided by Multiview Segmentation
with Neural Radiance Fields [53.32527220134249]
The emergence of Neural Radiance Fields (NeRF) for novel view synthesis has increased interest in 3D scene editing.
Current methods face challenges such as time-consuming object labeling, limited capability to remove specific targets, and compromised rendering quality after removal.
This paper proposes a novel object-removing pipeline, named OR-NeRF, that can remove objects from 3D scenes with user-given points or text prompts on a single view.
arXiv Detail & Related papers (2023-05-17T18:18:05Z) - DisCoScene: Spatially Disentangled Generative Radiance Fields for
Controllable 3D-aware Scene Synthesis [90.32352050266104]
DisCoScene is a 3Daware generative model for high-quality and controllable scene synthesis.
It disentangles the whole scene into object-centric generative fields by learning on only 2D images with the global-local discrimination.
We demonstrate state-of-the-art performance on many scene datasets, including the challenging outdoor dataset.
arXiv Detail & Related papers (2022-12-22T18:59:59Z) - Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators [97.12135238534628]
We propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects.
Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts.
Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks.
arXiv Detail & Related papers (2022-12-13T01:36:56Z) - Occlusion-Aware Video Object Inpainting [72.38919601150175]
This paper presents occlusion-aware video object inpainting, which recovers both the complete shape and appearance for occluded objects in videos.
Our technical contribution VOIN jointly performs video object shape completion and occluded texture generation.
For more realistic results, VOIN is optimized using both T-PatchGAN and a newoc-temporal YouTube attention-based multi-class discriminator.
arXiv Detail & Related papers (2021-08-15T15:46:57Z) - Omnimatte: Associating Objects and Their Effects in Video [100.66205249649131]
Scene effects related to objects in video are typically overlooked by computer vision.
In this work, we take a step towards solving this novel problem of automatically associating objects with their effects in video.
Our model is trained only on the input video in a self-supervised manner, without any manual labels, and is generic---it produces omnimattes automatically for arbitrary objects and a variety of effects.
arXiv Detail & Related papers (2021-05-14T17:57:08Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - REGRAD: A Large-Scale Relational Grasp Dataset for Safe and
Object-Specific Robotic Grasping in Clutter [52.117388513480435]
We present a new dataset named regrad to sustain the modeling of relationships among objects and grasps.
Our dataset is collected in both forms of 2D images and 3D point clouds.
Users are free to import their own object models for the generation of as many data as they want.
arXiv Detail & Related papers (2021-04-29T05:31:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.