LiDAR dataset distillation within bayesian active learning framework:
Understanding the effect of data augmentation
- URL: http://arxiv.org/abs/2202.02661v1
- Date: Sun, 6 Feb 2022 00:04:21 GMT
- Title: LiDAR dataset distillation within bayesian active learning framework:
Understanding the effect of data augmentation
- Authors: Ngoc Phuong Anh Duong and Alexandre Almin and L\'eo Lemari\'e and B
Ravi Kiran
- Abstract summary: Active learning (AL) has re-gained attention recently to address reduction of annotation costs and dataset size.
This paper performs a principled evaluation of AL based dataset distillation on (1/4th) of the large Semantic-KITTI dataset.
We observe that data augmentation achieves full dataset accuracy using only 60% of samples from the selected dataset configuration.
- Score: 63.20765930558542
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autonomous driving (AD) datasets have progressively grown in size in the past
few years to enable better deep representation learning. Active learning (AL)
has re-gained attention recently to address reduction of annotation costs and
dataset size. AL has remained relatively unexplored for AD datasets, especially
on point cloud data from LiDARs. This paper performs a principled evaluation of
AL based dataset distillation on (1/4th) of the large Semantic-KITTI dataset.
Further on, the gains in model performance due to data augmentation (DA) are
demonstrated across different subsets of the AL loop. We also demonstrate how
DA improves the selection of informative samples to annotate. We observe that
data augmentation achieves full dataset accuracy using only 60\% of samples
from the selected dataset configuration. This provides faster training time and
subsequent gains in annotation costs.
Related papers
- ExaRanker-Open: Synthetic Explanation for IR using Open-Source LLMs [60.81649785463651]
We introduce ExaRanker-Open, where we adapt and explore the use of open-source language models to generate explanations.
Our findings reveal that incorporating explanations consistently enhances neural rankers, with benefits escalating as the LLM size increases.
arXiv Detail & Related papers (2024-02-09T11:23:14Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning [47.02160072880698]
We introduce a self-evolving mechanism that allows the model itself to actively sample subsets that are equally or even more effective.
The key to our data sampling technique lies in the enhancement of diversity in the chosen subsets.
Extensive experiments across three datasets and benchmarks demonstrate the effectiveness of DiverseEvol.
arXiv Detail & Related papers (2023-11-14T14:10:40Z) - Embarassingly Simple Dataset Distillation [0.0]
We tackle dataset distillation at its core by treating it directly as a bilevel optimization problem.
A deeper dive into the nature of distilled data unveils pronounced intercorrelation.
We devise a boosting mechanism that generates distilled datasets that contain subsets with near optimal performance across different data budgets.
arXiv Detail & Related papers (2023-11-13T02:14:54Z) - Evaluating the effect of data augmentation and BALD heuristics on
distillation of Semantic-KITTI dataset [63.20765930558542]
Active Learning has remained relatively unexplored for LiDAR perception tasks in autonomous driving datasets.
We evaluate Bayesian active learning methods applied to the task of dataset distillation or core subset selection.
We also study the effect of application of data augmentation within Bayesian AL based dataset distillation.
arXiv Detail & Related papers (2023-02-21T13:56:47Z) - Dataset Distillation: A Comprehensive Review [76.26276286545284]
dataset distillation (DD) aims to derive a much smaller dataset containing synthetic samples, based on which the trained models yield performance comparable with those trained on the original dataset.
This paper gives a comprehensive review and summary of recent advances in DD and its application.
arXiv Detail & Related papers (2023-01-17T17:03:28Z) - LADA: Look-Ahead Data Acquisition via Augmentation for Active Learning [24.464022706979886]
This paper proposes Look-Ahead Data Acquisition via augmentation, or LADA, to integrate data acquisition and data augmentation.
LADA considers both 1) unlabeled data instance to be selected and 2) virtual data instance to be generated by data augmentation.
The performance of LADA shows a significant improvement over the recent augmentation and acquisition baselines.
arXiv Detail & Related papers (2020-11-09T05:21:14Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.