ActiveAnno3D -- An Active Learning Framework for Multi-Modal 3D Object Detection
- URL: http://arxiv.org/abs/2402.03235v2
- Date: Sun, 08 Dec 2024 15:15:15 GMT
- Title: ActiveAnno3D -- An Active Learning Framework for Multi-Modal 3D Object Detection
- Authors: Ahmed Ghita, Bjørk Antoniussen, Walter Zimmer, Ross Greer, Christian Creß, Andreas Møgelmose, Mohan M. Trivedi, Alois C. Knoll,
- Abstract summary: We propose ActiveAnno3D, an active learning framework to select data samples for labeling.
We perform experiments and ablation studies with BEVFusion and PV-RCNN on the nuScenes and TUM Traffic Intersection dataset.
We integrate our active learning framework into the proAnno labeling tool to enable AI-assisted data selection and labeling.
- Score: 15.22941019659832
- License:
- Abstract: The curation of large-scale datasets is still costly and requires much time and resources. Data is often manually labeled, and the challenge of creating high-quality datasets remains. In this work, we fill the research gap using active learning for multi-modal 3D object detection. We propose ActiveAnno3D, an active learning framework to select data samples for labeling that are of maximum informativeness for training. We explore various continuous training methods and integrate the most efficient method regarding computational demand and detection performance. Furthermore, we perform extensive experiments and ablation studies with BEVFusion and PV-RCNN on the nuScenes and TUM Traffic Intersection dataset. We show that we can achieve almost the same performance with PV-RCNN and the entropy-based query strategy when using only half of the training data (77.25 mAP compared to 83.50 mAP) of the TUM Traffic Intersection dataset. BEVFusion achieved an mAP of 64.31 when using half of the training data and 75.0 mAP when using the complete nuScenes dataset. We integrate our active learning framework into the proAnno labeling tool to enable AI-assisted data selection and labeling and minimize the labeling costs. Finally, we provide code, weights, and visualization results on our website: https://active3d-framework.github.io/active3d-framework.
Related papers
- Efficient Human-in-the-Loop Active Learning: A Novel Framework for Data Labeling in AI Systems [0.6267574471145215]
We propose a novel active learning framework with significant potential for application in modern AI systems.
Unlike the traditional active learning methods, which only focus on determining which data point should be labeled, our framework also introduces an innovative perspective on incorporating different query scheme.
Our proposed active learning framework exhibits higher accuracy and lower loss compared to other methods.
arXiv Detail & Related papers (2024-12-31T05:12:51Z) - TSceneJAL: Joint Active Learning of Traffic Scenes for 3D Object Detection [26.059907173437114]
TSceneJAL framework can efficiently sample the balanced, diverse, and complex traffic scenes from both labeled and unlabeled data.
Our approach outperforms existing state-of-the-art methods on 3D object detection tasks with up to 12% improvements.
arXiv Detail & Related papers (2024-12-25T11:07:04Z) - AutoSynth: Learning to Generate 3D Training Data for Object Point Cloud
Registration [69.21282992341007]
Auto Synth automatically generates 3D training data for point cloud registration.
We replace the point cloud registration network with a much smaller surrogate network, leading to a $4056.43$ speedup.
Our results on TUD-L, LINEMOD and Occluded-LINEMOD evidence that a neural network trained on our searched dataset yields consistently better performance than the same one trained on the widely used ModelNet40 dataset.
arXiv Detail & Related papers (2023-09-20T09:29:44Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Self-Supervised Human Activity Recognition with Localized Time-Frequency
Contrastive Representation Learning [16.457778420360537]
We propose a self-supervised learning solution for human activity recognition with smartphone accelerometer data.
We develop a model that learns strong representations from accelerometer signals, while reducing the model's reliance on class labels.
We evaluate the performance of the proposed solution on three datasets, namely MotionSense, HAPT, and HHAR.
arXiv Detail & Related papers (2022-08-26T22:47:18Z) - Open-Set Semi-Supervised Learning for 3D Point Cloud Understanding [62.17020485045456]
It is commonly assumed in semi-supervised learning (SSL) that the unlabeled data are drawn from the same distribution as that of the labeled ones.
We propose to selectively utilize unlabeled data through sample weighting, so that only conducive unlabeled data would be prioritized.
arXiv Detail & Related papers (2022-05-02T16:09:17Z) - Single-Modal Entropy based Active Learning for Visual Question Answering [75.1682163844354]
We address Active Learning in the multi-modal setting of Visual Question Answering (VQA)
In light of the multi-modal inputs, image and question, we propose a novel method for effective sample acquisition.
Our novel idea is simple to implement, cost-efficient, and readily adaptable to other multi-modal tasks.
arXiv Detail & Related papers (2021-10-21T05:38:45Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - PointContrast: Unsupervised Pre-training for 3D Point Cloud
Understanding [107.02479689909164]
In this work, we aim at facilitating research on 3D representation learning.
We measure the effect of unsupervised pre-training on a large source set of 3D scenes.
arXiv Detail & Related papers (2020-07-21T17:59:22Z) - ETRI-Activity3D: A Large-Scale RGB-D Dataset for Robots to Recognize
Daily Activities of the Elderly [6.597705088139007]
We introduce a new dataset called ETRI-Activity3D, focusing on the daily activities of the elderly in robot-view.
The proposed dataset contains 112,620 samples including RGB videos, depth maps, and skeleton sequences.
We also propose a novel network called four-stream adaptive CNN (FSA-CNN)
arXiv Detail & Related papers (2020-03-04T07:30:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.