Related papers: Scrape, Cut, Paste and Learn: Automated Dataset Generation Applied to Parcel Logistics

Scrape, Cut, Paste and Learn: Automated Dataset Generation Applied to Parcel Logistics

URL: http://arxiv.org/abs/2210.09814v1
Date: Tue, 18 Oct 2022 12:49:04 GMT
Title: Scrape, Cut, Paste and Learn: Automated Dataset Generation Applied to Parcel Logistics
Authors: Alexander Naumann and Felix Hertlein and Benchun Zhou and Laura D\"orr and Kai Furmans
Abstract summary: We present a fully automated pipeline to generate a synthetic dataset for instance segmentation in four steps. We first scrape images for the objects of interest from popular image search engines. We compare three different methods for image selection: Object-agnostic pre-processing, manual image selection and CNN-based image selection.
Score: 58.720142291102135
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: State-of-the-art approaches in computer vision heavily rely on sufficiently large training datasets. For real-world applications, obtaining such a dataset is usually a tedious task. In this paper, we present a fully automated pipeline to generate a synthetic dataset for instance segmentation in four steps. In contrast to existing work, our pipeline covers every step from data acquisition to the final dataset. We first scrape images for the objects of interest from popular image search engines and since we rely only on text-based queries the resulting data comprises a wide variety of images. Hence, image selection is necessary as a second step. This approach of image scraping and selection relaxes the need for a real-world domain-specific dataset that must be either publicly available or created for this purpose. We employ an object-agnostic background removal model and compare three different methods for image selection: Object-agnostic pre-processing, manual image selection and CNN-based image selection. In the third step, we generate random arrangements of the object of interest and distractors on arbitrary backgrounds. Finally, the composition of the images is done by pasting the objects using four different blending methods. We present a case study for our dataset generation approach by considering parcel segmentation. For the evaluation we created a dataset of parcel photos that were annotated automatically. We find that (1) our dataset generation pipeline allows a successful transfer to real test images (Mask AP 86.2), (2) a very accurate image selection process - in contrast to human intuition - is not crucial and a broader category definition can help to bridge the domain gap, (3) the usage of blending methods is beneficial compared to simple copy-and-paste. We made our full code for scraping, image composition and training publicly available at https://a-nau.github.io/parcel2d.

Related papers

EOPose : Exemplar-based object reposing using Generalized Pose Correspondences [16.104124493724274]
We propose an end-to-end framework for generic object reposing.<n>Our method, EOPose, takes a target pose-guidance image as input and uses its keypoint correspondence with the source object image to warp and re-render the latter into the target pose.<n>Unlike generative approaches, our method also preserves the fine-grained details of the object such as its exact colors, textures and brand marks.
arXiv Detail & Related papers (2025-05-06T10:17:32Z)
Adapt Anything: Tailor Any Image Classifiers across Domains And Categories Using Text-to-Image Diffusion Models [82.95591765009105]
We aim to study if a modern text-to-image diffusion model can tailor any task-adaptive image classifier across domains and categories. We utilize only one off-the-shelf text-to-image model to synthesize images with category labels derived from the corresponding text prompts.
arXiv Detail & Related papers (2023-10-25T11:58:14Z)
Beyond Generation: Harnessing Text to Image Models for Object Detection and Segmentation [29.274362919954218]
We propose a new paradigm to automatically generate training data with accurate labels at scale. The proposed approach decouples training data generation into foreground object generation, and contextually coherent background generation. We demonstrate the advantages of our approach on five object detection and segmentation datasets.
arXiv Detail & Related papers (2023-09-12T04:41:45Z)
Semantic keypoint-based pose estimation from single RGB frames [64.80395521735463]
We present an approach to estimating the continuous 6-DoF pose of an object from a single RGB image. The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model. We show that our approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios.
arXiv Detail & Related papers (2022-04-12T15:03:51Z)
Complex Scene Image Editing by Scene Graph Comprehension [17.72638225034884]
We propose a two-stage method for achieving complex scene image editing by Scene Graph (SGC-Net) In the first stage, we train a Region of Interest (RoI) prediction network that uses scene graphs and predict the locations of the target objects. The second stage uses a conditional diffusion model to edit the image based on our RoI predictions.
arXiv Detail & Related papers (2022-03-24T05:12:54Z)
Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [67.6609943904996]
The goal of this work is to efficiently identify visually similar patterns from a pair of images. We generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image. We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset.
arXiv Detail & Related papers (2021-10-29T16:51:16Z)
DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort [117.41383937100751]
Current deep networks are extremely data-hungry, benefiting from training on large-scale datasets. We show how the GAN latent code can be decoded to produce a semantic segmentation of the image. These generated datasets can then be used for training any computer vision architecture just as real datasets are.
arXiv Detail & Related papers (2021-04-13T20:08:29Z)
COTR: Correspondence Transformer for Matching Across Images [31.995943755283786]
We propose a novel framework for finding correspondences in images based on a deep neural network. By doing so, one has the option to query only the points of interest and retrieve sparse correspondences, or to query all points in an image and obtain dense mappings.
arXiv Detail & Related papers (2021-03-25T22:47:02Z)
Six-channel Image Representation for Cross-domain Object Detection [17.854940064699985]
Deep learning models are data-driven and the excellent performance is highly dependent on the abundant and diverse datasets. Some image-to-image translation techniques are employed to generate some fake data of some specific scenes to train the models. We propose to inspire the original 3-channel images and their corresponding GAN-generated fake images to form 6-channel representations of the dataset.
arXiv Detail & Related papers (2021-01-03T04:50:03Z)
OneGAN: Simultaneous Unsupervised Learning of Conditional Image Generation, Foreground Segmentation, and Fine-Grained Clustering [100.32273175423146]
We present a method for simultaneously learning, in an unsupervised manner, a conditional image generator, foreground extraction and segmentation, and object removal and background completion. The method combines a Geneversarative Adrial Network and a Variational Auto-Encoder, with multiple encoders, generators and discriminators, and benefits from solving all tasks at once.
arXiv Detail & Related papers (2019-12-31T18:15:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.