VDOR: A Video-based Dataset for Object Removal via Sequence Consistency
- URL: http://arxiv.org/abs/2501.07397v2
- Date: Fri, 31 Jan 2025 06:41:24 GMT
- Title: VDOR: A Video-based Dataset for Object Removal via Sequence Consistency
- Authors: Runpu Wei, Shuo Zhang, Zhonghao Yan, Zijin Yin, Xueyi Wang, Kongming Liang, Zhanyu Ma,
- Abstract summary: Existing datasets related to object removal serve a valuable foundation for model validation and optimization.
We propose a novel video-based annotation pipeline for constructing a realistic illumination-aware object removal dataset.
By leveraging continuous real-world video frames, we minimize distribution gaps and accurately capture realistic lighting and shadow variations.
- Score: 19.05827956984347
- License:
- Abstract: Object removal, as a sub-task of image inpainting, has garnered significant attention in recent years. Existing datasets related to object removal serve a valuable foundation for model validation and optimization. However, they mainly rely on inpainting techniques to generate pseudo-removed results, leading to distribution gaps between synthetic and real-world data. While some real-world datasets mitigate these issues, they face challenges such as limited scalability, high annotation costs, and unrealistic representations of lighting and shadows. To address these limitations, we propose a novel video-based annotation pipeline for constructing a realistic illumination-aware object removal dataset. Leveraging this pipeline, we introduce VDOR, a dataset specifically designed for object removal tasks, which comprises triplets of original frame images with objects, background images without objects, and corresponding masks. By leveraging continuous real-world video frames, we minimize distribution gaps and accurately capture realistic lighting and shadow variations, ensuring close alignment with real-world scenarios. Our approach significantly reduces annotation effort while providing a robust foundation for advancing object removal research.
Related papers
- Generative Image Layer Decomposition with Visual Effects [49.75021036203426]
LayerDecomp is a generative framework for image layer decomposition.
It produces clean backgrounds and high-quality transparent foregrounds with faithfully preserved visual effects.
Our method achieves superior quality in layer decomposition, outperforming existing approaches in object removal and spatial editing tasks.
arXiv Detail & Related papers (2024-11-26T20:26:49Z) - MegaScenes: Scene-Level View Synthesis at Scale [69.21293001231993]
Scene-level novel view synthesis (NVS) is fundamental to many vision and graphics applications.
We create a large-scale scene-level dataset from Internet photo collections, called MegaScenes, which contains over 100K structure from motion (SfM) reconstructions from around the world.
We analyze failure cases of state-of-the-art NVS methods and significantly improve generation consistency.
arXiv Detail & Related papers (2024-06-17T17:55:55Z) - DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion [34.29147907526832]
Diffusion models have revolutionized image editing but often generate images that violate physical laws.
We propose a practical solution centered on a qcounterfactual dataset.
By fine-tuning a diffusion model on this dataset, we are able to not only remove objects but also their effects on the scene.
arXiv Detail & Related papers (2024-03-27T17:59:52Z) - High-resolution Iterative Feedback Network for Camouflaged Object
Detection [128.893782016078]
Spotting camouflaged objects that are visually assimilated into the background is tricky for object detection algorithms.
We aim to extract the high-resolution texture details to avoid the detail degradation that causes blurred vision in edges and boundaries.
We introduce a novel HitNet to refine the low-resolution representations by high-resolution features in an iterative feedback manner.
arXiv Detail & Related papers (2022-03-22T11:20:21Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Detection and Segmentation of Custom Objects using High Distraction
Photorealistic Synthetic Data [0.5076419064097732]
We show a straightforward and useful methodology for performing instance segmentation using synthetic data.
The goal is to achieve high performance on manually-gathered and annotated real-world data of custom objects.
This white-paper provides strong evidence that photorealistic simulated data can be used in practical real world applications.
arXiv Detail & Related papers (2020-07-28T16:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.