Small Object Detection for Near Real-Time Egocentric Perception in a
Manual Assembly Scenario
- URL: http://arxiv.org/abs/2106.06403v1
- Date: Fri, 11 Jun 2021 13:59:44 GMT
- Title: Small Object Detection for Near Real-Time Egocentric Perception in a
Manual Assembly Scenario
- Authors: Hooman Tavakoli, Snehal Walunj, Parsha Pahlevannejad, Christiane
Plociennik, and Martin Ruskowski
- Abstract summary: We describe a near real-time small object detection pipeline for egocentric perception in a manual assembly scenario.
First, the context is recognized, then the small object of interest is detected.
We evaluate our pipeline on the augmented reality device Microsoft Hololens 2.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detecting small objects in video streams of head-worn augmented reality
devices in near real-time is a huge challenge: training data is typically
scarce, the input video stream can be of limited quality, and small objects are
notoriously hard to detect. In industrial scenarios, however, it is often
possible to leverage contextual knowledge for the detection of small objects.
Furthermore, CAD data of objects are typically available and can be used to
generate synthetic training data. We describe a near real-time small object
detection pipeline for egocentric perception in a manual assembly scenario: We
generate a training data set based on CAD data and realistic backgrounds in
Unity. We then train a YOLOv4 model for a two-stage detection process: First,
the context is recognized, then the small object of interest is detected. We
evaluate our pipeline on the augmented reality device Microsoft Hololens 2.
Related papers
- Synthetica: Large Scale Synthetic Data for Robot Perception [21.415878105900187]
We present Synthetica, a method for large-scale synthetic data generation for training robust state estimators.
This paper focuses on the task of object detection, an important problem which can serve as the front-end for most state estimation problems.
We leverage data from a ray-tracing, generating 2.7 million images, to train highly accurate real-time detection transformers.
We demonstrate state-of-the-art performance on the task of object detection while having detectors that run at 50-100Hz which is 9 times faster than the prior SOTA.
arXiv Detail & Related papers (2024-10-28T15:50:56Z) - PatchContrast: Self-Supervised Pre-training for 3D Object Detection [14.603858163158625]
We introduce PatchContrast, a novel self-supervised point cloud pre-training framework for 3D object detection.
We show that our method outperforms existing state-of-the-art models on three commonly-used 3D detection datasets.
arXiv Detail & Related papers (2023-08-14T07:45:54Z) - OVTrack: Open-Vocabulary Multiple Object Tracking [64.73379741435255]
OVTrack is an open-vocabulary tracker capable of tracking arbitrary object classes.
It sets a new state-of-the-art on the large-scale, large-vocabulary TAO benchmark.
arXiv Detail & Related papers (2023-04-17T16:20:05Z) - CrowdSim2: an Open Synthetic Benchmark for Object Detectors [0.7223361655030193]
This paper presents and publicly releases CrowdSim2, a new synthetic collection of images suitable for people and vehicle detection.
It consists of thousands of images gathered from various synthetic scenarios resembling the real world, where we varied some factors of interest.
We exploited this new benchmark as a testing ground for some state-of-the-art detectors, showing that our simulated scenarios can be a valuable tool for measuring their performances in a controlled environment.
arXiv Detail & Related papers (2023-04-11T09:35:57Z) - Generalized Few-Shot 3D Object Detection of LiDAR Point Cloud for
Autonomous Driving [91.39625612027386]
We propose a novel task, called generalized few-shot 3D object detection, where we have a large amount of training data for common (base) objects, but only a few data for rare (novel) classes.
Specifically, we analyze in-depth differences between images and point clouds, and then present a practical principle for the few-shot setting in the 3D LiDAR dataset.
To solve this task, we propose an incremental fine-tuning method to extend existing 3D detection models to recognize both common and rare objects.
arXiv Detail & Related papers (2023-02-08T07:11:36Z) - Bridging the Gap to Real-World Object-Centric Learning [66.55867830853803]
We show that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way.
Our approach, DINOSAUR, significantly out-performs existing object-centric learning models on simulated data.
arXiv Detail & Related papers (2022-09-29T15:24:47Z) - IFOR: Iterative Flow Minimization for Robotic Object Rearrangement [92.97142696891727]
IFOR, Iterative Flow Minimization for Robotic Object Rearrangement, is an end-to-end method for the problem of object rearrangement for unknown objects.
We show that our method applies to cluttered scenes, and in the real world, while training only on synthetic data.
arXiv Detail & Related papers (2022-02-01T20:03:56Z) - 3D Annotation Of Arbitrary Objects In The Wild [0.0]
We propose a data annotation pipeline based on SLAM, 3D reconstruction, and 3D-to-2D geometry.
The pipeline allows creating 3D and 2D bounding boxes, along with per-pixel annotations of arbitrary objects.
Our results showcase almost 90% Intersection-over-Union (IoU) agreement on both semantic segmentation and 2D bounding box detection.
arXiv Detail & Related papers (2021-09-15T09:00:56Z) - Analysis of voxel-based 3D object detection methods efficiency for
real-time embedded systems [93.73198973454944]
Two popular voxel-based 3D object detection methods are studied in this paper.
Our experiments show that these methods mostly fail to detect distant small objects due to the sparsity of the input point clouds at large distances.
Our findings suggest that a considerable part of the computations of existing methods is focused on locations of the scene that do not contribute with successful detection.
arXiv Detail & Related papers (2021-05-21T12:40:59Z) - Weakly Supervised 3D Object Detection from Lidar Point Cloud [182.67704224113862]
It is laborious to manually label point cloud data for training high-quality 3D object detectors.
This work proposes a weakly supervised approach for 3D object detection, only requiring a small set of weakly annotated scenes.
Using only 500 weakly annotated scenes and 534 precisely labeled vehicle instances, our method achieves 85-95% the performance of current top-leading, fully supervised detectors.
arXiv Detail & Related papers (2020-07-23T10:12:46Z) - Exploring the Capabilities and Limits of 3D Monocular Object Detection
-- A Study on Simulation and Real World Data [0.0]
3D object detection based on monocular camera data is key enabler for autonomous driving.
Recent deep learning methods show promising results to recover depth information from single images.
In this paper, we evaluate the performance of a 3D object detection pipeline which is parameterizable with different depth estimation configurations.
arXiv Detail & Related papers (2020-05-15T09:05:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.