Active Data Acquisition in Autonomous Driving Simulation
- URL: http://arxiv.org/abs/2306.13923v1
- Date: Sat, 24 Jun 2023 10:07:35 GMT
- Title: Active Data Acquisition in Autonomous Driving Simulation
- Authors: Jianyu Lai, Zexuan Jia, Boao Li
- Abstract summary: This paper proposes the concept of an active data-collecting strategy.
For high-quality data, increasing the collection density can improve the overall quality of the dataset.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Autonomous driving algorithms rely heavily on learning-based models, which
require large datasets for training. However, there is often a large amount of
redundant information in these datasets, while collecting and processing these
datasets can be time-consuming and expensive. To address this issue, this paper
proposes the concept of an active data-collecting strategy. For high-quality
data, increasing the collection density can improve the overall quality of the
dataset, ultimately achieving similar or even better results than the original
dataset with lower labeling costs and smaller dataset sizes. In this paper, we
design experiments to verify the quality of the collected dataset and to
demonstrate this strategy can significantly reduce labeling costs and dataset
size while improving the overall quality of the dataset, leading to better
performance of autonomous driving systems. The source code implementing the
proposed approach is publicly available on
https://github.com/Th1nkMore/carla_dataset_tools.
Related papers
- A CLIP-Powered Framework for Robust and Generalizable Data Selection [51.46695086779598]
Real-world datasets often contain redundant and noisy data, imposing a negative impact on training efficiency and model performance.
Data selection has shown promise in identifying the most representative samples from the entire dataset.
We propose a novel CLIP-powered data selection framework that leverages multimodal information for more robust and generalizable sample selection.
arXiv Detail & Related papers (2024-10-15T03:00:58Z) - TAGCOS: Task-agnostic Gradient Clustered Coreset Selection for Instruction Tuning Data [29.45013725650798]
It is essential to extract a subset of instruction datasets that achieves comparable performance to the full dataset.
We propose Task-Agnostic Gradient Clustered COreset Selection (TAGCOS)
Specifically, we leverage sample gradients as the data representations, perform clustering to group similar data, and apply an efficient greedy algorithm for coreset selection.
arXiv Detail & Related papers (2024-07-21T17:59:20Z) - Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach [36.47860223750303]
We consider the problem of automatic curation of high-quality datasets for self-supervised pre-training.
We propose a clustering-based approach for building ones satisfying all these criteria.
Our method involves successive and hierarchical applications of $k$-means on a large and diverse data repository.
arXiv Detail & Related papers (2024-05-24T14:58:51Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting [65.71129509623587]
Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning.
However, the promising results achieved on current public datasets may not be applicable to practical scenarios.
We introduce the LargeST benchmark dataset, which includes a total of 8,600 sensors in California with a 5-year time coverage.
arXiv Detail & Related papers (2023-06-14T05:48:36Z) - RLBoost: Boosting Supervised Models using Deep Reinforcement Learning [0.0]
We present RLBoost, an algorithm that uses deep reinforcement learning strategies to evaluate a particular dataset and obtain a model capable of estimating the quality of any new data.
The results of the article show that this model obtains better and more stable results than other state-of-the-art algorithms such as LOO, DataShapley or DVRL.
arXiv Detail & Related papers (2023-05-23T14:38:33Z) - DC-BENCH: Dataset Condensation Benchmark [79.18718490863908]
This work provides the first large-scale standardized benchmark on dataset condensation.
It consists of a suite of evaluations to comprehensively reflect the generability and effectiveness of condensation methods.
The benchmark library is open-sourced to facilitate future research and application.
arXiv Detail & Related papers (2022-07-20T03:54:05Z) - A Proposal to Study "Is High Quality Data All We Need?" [8.122270502556374]
We propose an empirical study that examines how to select a subset of and/or create high quality benchmark data.
We seek to answer if big datasets are truly needed to learn a task, and whether a smaller subset of high quality data can replace big datasets.
arXiv Detail & Related papers (2022-03-12T10:50:13Z) - LiDAR dataset distillation within bayesian active learning framework:
Understanding the effect of data augmentation [63.20765930558542]
Active learning (AL) has re-gained attention recently to address reduction of annotation costs and dataset size.
This paper performs a principled evaluation of AL based dataset distillation on (1/4th) of the large Semantic-KITTI dataset.
We observe that data augmentation achieves full dataset accuracy using only 60% of samples from the selected dataset configuration.
arXiv Detail & Related papers (2022-02-06T00:04:21Z) - Efficient Conditional Pre-training for Transfer Learning [71.01129334495553]
We propose efficient filtering methods to select relevant subsets from the pre-training dataset.
We validate our techniques by pre-training on ImageNet in both the unsupervised and supervised settings.
We improve standard ImageNet pre-training by 1-3% by tuning available models on our subsets and pre-training on a dataset filtered from a larger scale dataset.
arXiv Detail & Related papers (2020-11-20T06:16:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.