ActiveAnno3D -- An Active Learning Framework for Multi-Modal 3D Object
Detection
- URL: http://arxiv.org/abs/2402.03235v1
- Date: Mon, 5 Feb 2024 17:52:58 GMT
- Title: ActiveAnno3D -- An Active Learning Framework for Multi-Modal 3D Object
Detection
- Authors: Ahmed Ghita, Bj{\o}rk Antoniussen, Walter Zimmer, Ross Greer,
Christian Cre{\ss}, Andreas M{\o}gelmose, Mohan M. Trivedi, Alois C. Knoll
- Abstract summary: We propose ActiveAnno3D, an active learning framework to select data samples for labeling.
We perform experiments and ablation studies with BEVFusion and PV-RCNN on the nuScenes and TUM Traffic Intersection dataset.
We integrate our active learning framework into the proAnno labeling tool to enable AI-assisted data selection and labeling.
- Score: 15.885344033374393
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The curation of large-scale datasets is still costly and requires much time
and resources. Data is often manually labeled, and the challenge of creating
high-quality datasets remains. In this work, we fill the research gap using
active learning for multi-modal 3D object detection. We propose ActiveAnno3D,
an active learning framework to select data samples for labeling that are of
maximum informativeness for training. We explore various continuous training
methods and integrate the most efficient method regarding computational demand
and detection performance. Furthermore, we perform extensive experiments and
ablation studies with BEVFusion and PV-RCNN on the nuScenes and TUM Traffic
Intersection dataset. We show that we can achieve almost the same performance
with PV-RCNN and the entropy-based query strategy when using only half of the
training data (77.25 mAP compared to 83.50 mAP) of the TUM Traffic Intersection
dataset. BEVFusion achieved an mAP of 64.31 when using half of the training
data and 75.0 mAP when using the complete nuScenes dataset. We integrate our
active learning framework into the proAnno labeling tool to enable AI-assisted
data selection and labeling and minimize the labeling costs. Finally, we
provide code, weights, and visualization results on our website:
https://active3d-framework.github.io/active3d-framework.
Related papers
- STONE: A Submodular Optimization Framework for Active 3D Object Detection [20.54906045954377]
Key requirement for training an accurate 3D object detector is the availability of a large amount of LiDAR-based point cloud data.
This paper proposes a unified active 3D object detection framework, for greatly reducing the labeling cost of training 3D object detectors.
arXiv Detail & Related papers (2024-10-04T20:45:33Z) - The Why, When, and How to Use Active Learning in Large-Data-Driven 3D
Object Detection for Safe Autonomous Driving: An Empirical Exploration [1.2815904071470705]
entropy querying is a promising strategy for selecting data that enhances model learning in resource-constrained environments.
Our findings suggest that entropy querying is a promising strategy for selecting data that enhances model learning in resource-constrained environments.
arXiv Detail & Related papers (2024-01-30T00:14:13Z) - AutoSynth: Learning to Generate 3D Training Data for Object Point Cloud
Registration [69.21282992341007]
Auto Synth automatically generates 3D training data for point cloud registration.
We replace the point cloud registration network with a much smaller surrogate network, leading to a $4056.43$ speedup.
Our results on TUD-L, LINEMOD and Occluded-LINEMOD evidence that a neural network trained on our searched dataset yields consistently better performance than the same one trained on the widely used ModelNet40 dataset.
arXiv Detail & Related papers (2023-09-20T09:29:44Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Self-Supervised Human Activity Recognition with Localized Time-Frequency
Contrastive Representation Learning [16.457778420360537]
We propose a self-supervised learning solution for human activity recognition with smartphone accelerometer data.
We develop a model that learns strong representations from accelerometer signals, while reducing the model's reliance on class labels.
We evaluate the performance of the proposed solution on three datasets, namely MotionSense, HAPT, and HHAR.
arXiv Detail & Related papers (2022-08-26T22:47:18Z) - Open-Set Semi-Supervised Learning for 3D Point Cloud Understanding [62.17020485045456]
It is commonly assumed in semi-supervised learning (SSL) that the unlabeled data are drawn from the same distribution as that of the labeled ones.
We propose to selectively utilize unlabeled data through sample weighting, so that only conducive unlabeled data would be prioritized.
arXiv Detail & Related papers (2022-05-02T16:09:17Z) - Single-Modal Entropy based Active Learning for Visual Question Answering [75.1682163844354]
We address Active Learning in the multi-modal setting of Visual Question Answering (VQA)
In light of the multi-modal inputs, image and question, we propose a novel method for effective sample acquisition.
Our novel idea is simple to implement, cost-efficient, and readily adaptable to other multi-modal tasks.
arXiv Detail & Related papers (2021-10-21T05:38:45Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - PointContrast: Unsupervised Pre-training for 3D Point Cloud
Understanding [107.02479689909164]
In this work, we aim at facilitating research on 3D representation learning.
We measure the effect of unsupervised pre-training on a large source set of 3D scenes.
arXiv Detail & Related papers (2020-07-21T17:59:22Z) - ETRI-Activity3D: A Large-Scale RGB-D Dataset for Robots to Recognize
Daily Activities of the Elderly [6.597705088139007]
We introduce a new dataset called ETRI-Activity3D, focusing on the daily activities of the elderly in robot-view.
The proposed dataset contains 112,620 samples including RGB videos, depth maps, and skeleton sequences.
We also propose a novel network called four-stream adaptive CNN (FSA-CNN)
arXiv Detail & Related papers (2020-03-04T07:30:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.