Fast Training Data Acquisition for Object Detection and Segmentation using Black Screen Luminance Keying
- URL: http://arxiv.org/abs/2405.07653v1
- Date: Mon, 13 May 2024 11:28:58 GMT
- Title: Fast Training Data Acquisition for Object Detection and Segmentation using Black Screen Luminance Keying
- Authors: Thomas Pöllabauer, Volker Knauthe, André Boller, Arjan Kuijper, Dieter Fellner,
- Abstract summary: Deep Neural Networks (DNNs) require large amounts of annotated training data for a good performance.
A fast and straightforward approach of acquiring the necessary training data would allow the adoption of deep learning to even the smallest of applications.
Our work demonstrates highly accurate training data acquisition allowing to start training state-of-the-art networks within minutes.
- Score: 4.491665410263268
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Neural Networks (DNNs) require large amounts of annotated training data for a good performance. Often this data is generated using manual labeling (error-prone and time-consuming) or rendering (requiring geometry and material information). Both approaches make it difficult or uneconomic to apply them to many small-scale applications. A fast and straightforward approach of acquiring the necessary training data would allow the adoption of deep learning to even the smallest of applications. Chroma keying is the process of replacing a color (usually blue or green) with another background. Instead of chroma keying, we propose luminance keying for fast and straightforward training image acquisition. We deploy a black screen with high light absorption (99.99\%) to record roughly 1-minute long videos of our target objects, circumventing typical problems of chroma keying, such as color bleeding or color overlap between background color and object color. Next we automatically mask our objects using simple brightness thresholding, saving the need for manual annotation. Finally, we automatically place the objects on random backgrounds and train a 2D object detector. We do extensive evaluation of the performance on the widely-used YCB-V object set and compare favourably to other conventional techniques such as rendering, without needing 3D meshes, materials or any other information of our target objects and in a fraction of the time needed for other approaches. Our work demonstrates highly accurate training data acquisition allowing to start training state-of-the-art networks within minutes.
Related papers
- YCB-LUMA: YCB Object Dataset with Luminance Keying for Object Localization [0.0]
Localizing target objects in images is an important task in computer vision.
We extend previous work that presented luminance keying on the common YCB-V set of household objects by recording the remaining objects of the YCB superset.
The additional variety of objects demonstrates the usefulness of luminance keying and might be used to test the applicability of the approach on new 2D object detection and segmentation algorithms.
arXiv Detail & Related papers (2024-11-20T09:32:22Z) - Pre-Training LiDAR-Based 3D Object Detectors Through Colorization [65.03659880456048]
We introduce an innovative pre-training approach, Grounded Point Colorization (GPC), to bridge the gap between data and labels.
GPC teaches the model to colorize LiDAR point clouds, equipping it with valuable semantic cues.
Experimental results on the KITTI and datasets demonstrate GPC's remarkable effectiveness.
arXiv Detail & Related papers (2023-10-23T06:00:24Z) - Learning Higher-order Object Interactions for Keypoint-based Video
Understanding [15.52736059969859]
We describe an action-localization method, KeyNet, that uses only the keypoint data for tracking and action recognition.
We find that KeyNet is able to track and classify human actions at just 5 FPS.
arXiv Detail & Related papers (2023-05-16T15:30:33Z) - RandomRooms: Unsupervised Pre-training from Synthetic Shapes and
Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets.
Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications.
In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z) - Few-Cost Salient Object Detection with Adversarial-Paced Learning [95.0220555274653]
This paper proposes to learn the effective salient object detection model based on the manual annotation on a few training images only.
We name this task as the few-cost salient object detection and propose an adversarial-paced learning (APL)-based framework to facilitate the few-cost learning scenario.
arXiv Detail & Related papers (2021-04-05T14:15:49Z) - Few-Shot Learning for Video Object Detection in a Transfer-Learning
Scheme [70.45901040613015]
We study the new problem of few-shot learning for video object detection.
We employ a transfer-learning framework to effectively train the video object detector on a large number of base-class objects and a few video clips of novel-class objects.
arXiv Detail & Related papers (2021-03-26T20:37:55Z) - Supervised Training of Dense Object Nets using Optimal Descriptors for
Industrial Robotic Applications [57.87136703404356]
Dense Object Nets (DONs) by Florence, Manuelli and Tedrake introduced dense object descriptors as a novel visual object representation for the robotics community.
In this paper we show that given a 3D model of an object, we can generate its descriptor space image, which allows for supervised training of DONs.
We compare the training methods on generating 6D grasps for industrial objects and show that our novel supervised training approach improves the pick-and-place performance in industry-relevant tasks.
arXiv Detail & Related papers (2021-02-16T11:40:12Z) - Is Depth Really Necessary for Salient Object Detection? [50.10888549190576]
We make the first attempt in realizing an unified depth-aware framework with only RGB information as input for inference.
Not only surpasses the state-of-the-art performances on five public RGB SOD benchmarks, but also surpasses the RGBD-based methods on five benchmarks by a large margin.
arXiv Detail & Related papers (2020-05-30T13:40:03Z) - Visual Descriptor Learning from Monocular Video [25.082587246288995]
We propose a novel way to estimate dense correspondence on an RGB image by training a fully convolutional network.
Our method learns from RGB videos using contrastive loss, where relative labeling is estimated from optical flow.
Not only does the method perform well on test data with the same background, it also generalizes to situations with a new background.
arXiv Detail & Related papers (2020-04-15T11:19:57Z) - Self-Supervised Object-in-Gripper Segmentation from Robotic Motions [27.915309216800125]
We propose a robust solution for learning to segment unknown objects grasped by a robot.
We exploit motion and temporal cues in RGB video sequences.
Our approach is fully self-supervised and independent of precise camera calibration, 3D models or potentially imperfect depth data.
arXiv Detail & Related papers (2020-02-11T15:44:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.