Scrape, Cut, Paste and Learn: Automated Dataset Generation Applied to
Parcel Logistics
- URL: http://arxiv.org/abs/2210.09814v1
- Date: Tue, 18 Oct 2022 12:49:04 GMT
- Title: Scrape, Cut, Paste and Learn: Automated Dataset Generation Applied to
Parcel Logistics
- Authors: Alexander Naumann and Felix Hertlein and Benchun Zhou and Laura D\"orr
and Kai Furmans
- Abstract summary: We present a fully automated pipeline to generate a synthetic dataset for instance segmentation in four steps.
We first scrape images for the objects of interest from popular image search engines.
We compare three different methods for image selection: Object-agnostic pre-processing, manual image selection and CNN-based image selection.
- Score: 58.720142291102135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art approaches in computer vision heavily rely on sufficiently
large training datasets. For real-world applications, obtaining such a dataset
is usually a tedious task. In this paper, we present a fully automated pipeline
to generate a synthetic dataset for instance segmentation in four steps. In
contrast to existing work, our pipeline covers every step from data acquisition
to the final dataset. We first scrape images for the objects of interest from
popular image search engines and since we rely only on text-based queries the
resulting data comprises a wide variety of images. Hence, image selection is
necessary as a second step. This approach of image scraping and selection
relaxes the need for a real-world domain-specific dataset that must be either
publicly available or created for this purpose. We employ an object-agnostic
background removal model and compare three different methods for image
selection: Object-agnostic pre-processing, manual image selection and CNN-based
image selection. In the third step, we generate random arrangements of the
object of interest and distractors on arbitrary backgrounds. Finally, the
composition of the images is done by pasting the objects using four different
blending methods. We present a case study for our dataset generation approach
by considering parcel segmentation. For the evaluation we created a dataset of
parcel photos that were annotated automatically. We find that (1) our dataset
generation pipeline allows a successful transfer to real test images (Mask AP
86.2), (2) a very accurate image selection process - in contrast to human
intuition - is not crucial and a broader category definition can help to bridge
the domain gap, (3) the usage of blending methods is beneficial compared to
simple copy-and-paste. We made our full code for scraping, image composition
and training publicly available at https://a-nau.github.io/parcel2d.
Related papers
- Adapt Anything: Tailor Any Image Classifiers across Domains And
Categories Using Text-to-Image Diffusion Models [82.95591765009105]
We aim to study if a modern text-to-image diffusion model can tailor any task-adaptive image classifier across domains and categories.
We utilize only one off-the-shelf text-to-image model to synthesize images with category labels derived from the corresponding text prompts.
arXiv Detail & Related papers (2023-10-25T11:58:14Z) - Beyond Generation: Harnessing Text to Image Models for Object Detection
and Segmentation [29.274362919954218]
We propose a new paradigm to automatically generate training data with accurate labels at scale.
The proposed approach decouples training data generation into foreground object generation, and contextually coherent background generation.
We demonstrate the advantages of our approach on five object detection and segmentation datasets.
arXiv Detail & Related papers (2023-09-12T04:41:45Z) - Semantic keypoint-based pose estimation from single RGB frames [64.80395521735463]
We present an approach to estimating the continuous 6-DoF pose of an object from a single RGB image.
The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model.
We show that our approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios.
arXiv Detail & Related papers (2022-04-12T15:03:51Z) - Complex Scene Image Editing by Scene Graph Comprehension [17.72638225034884]
We propose a two-stage method for achieving complex scene image editing by Scene Graph (SGC-Net)
In the first stage, we train a Region of Interest (RoI) prediction network that uses scene graphs and predict the locations of the target objects.
The second stage uses a conditional diffusion model to edit the image based on our RoI predictions.
arXiv Detail & Related papers (2022-03-24T05:12:54Z) - Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [67.6609943904996]
The goal of this work is to efficiently identify visually similar patterns from a pair of images.
We generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image.
We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset.
arXiv Detail & Related papers (2021-10-29T16:51:16Z) - DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort [117.41383937100751]
Current deep networks are extremely data-hungry, benefiting from training on large-scale datasets.
We show how the GAN latent code can be decoded to produce a semantic segmentation of the image.
These generated datasets can then be used for training any computer vision architecture just as real datasets are.
arXiv Detail & Related papers (2021-04-13T20:08:29Z) - COTR: Correspondence Transformer for Matching Across Images [31.995943755283786]
We propose a novel framework for finding correspondences in images based on a deep neural network.
By doing so, one has the option to query only the points of interest and retrieve sparse correspondences, or to query all points in an image and obtain dense mappings.
arXiv Detail & Related papers (2021-03-25T22:47:02Z) - Six-channel Image Representation for Cross-domain Object Detection [17.854940064699985]
Deep learning models are data-driven and the excellent performance is highly dependent on the abundant and diverse datasets.
Some image-to-image translation techniques are employed to generate some fake data of some specific scenes to train the models.
We propose to inspire the original 3-channel images and their corresponding GAN-generated fake images to form 6-channel representations of the dataset.
arXiv Detail & Related papers (2021-01-03T04:50:03Z) - OneGAN: Simultaneous Unsupervised Learning of Conditional Image
Generation, Foreground Segmentation, and Fine-Grained Clustering [100.32273175423146]
We present a method for simultaneously learning, in an unsupervised manner, a conditional image generator, foreground extraction and segmentation, and object removal and background completion.
The method combines a Geneversarative Adrial Network and a Variational Auto-Encoder, with multiple encoders, generators and discriminators, and benefits from solving all tasks at once.
arXiv Detail & Related papers (2019-12-31T18:15:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.