LiDAR dataset distillation within bayesian active learning framework:
Understanding the effect of data augmentation
- URL: http://arxiv.org/abs/2202.02661v1
- Date: Sun, 6 Feb 2022 00:04:21 GMT
- Title: LiDAR dataset distillation within bayesian active learning framework:
Understanding the effect of data augmentation
- Authors: Ngoc Phuong Anh Duong and Alexandre Almin and L\'eo Lemari\'e and B
Ravi Kiran
- Abstract summary: Active learning (AL) has re-gained attention recently to address reduction of annotation costs and dataset size.
This paper performs a principled evaluation of AL based dataset distillation on (1/4th) of the large Semantic-KITTI dataset.
We observe that data augmentation achieves full dataset accuracy using only 60% of samples from the selected dataset configuration.
- Score: 63.20765930558542
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autonomous driving (AD) datasets have progressively grown in size in the past
few years to enable better deep representation learning. Active learning (AL)
has re-gained attention recently to address reduction of annotation costs and
dataset size. AL has remained relatively unexplored for AD datasets, especially
on point cloud data from LiDARs. This paper performs a principled evaluation of
AL based dataset distillation on (1/4th) of the large Semantic-KITTI dataset.
Further on, the gains in model performance due to data augmentation (DA) are
demonstrated across different subsets of the AL loop. We also demonstrate how
DA improves the selection of informative samples to annotate. We observe that
data augmentation achieves full dataset accuracy using only 60\% of samples
from the selected dataset configuration. This provides faster training time and
subsequent gains in annotation costs.
Related papers
- DRUPI: Dataset Reduction Using Privileged Information [20.59889438709671]
dataset reduction (DR) seeks to select or distill samples from large datasets into smaller subsets while preserving performance on target tasks.
We introduce dataset Reduction Using Privileged Information (DRUPI), which enriches DR by synthesizing privileged information alongside the reduced dataset.
Our findings reveal that effective feature labels must balance between being overly discriminative and excessively diverse, with a moderate level proving optimal for improving the reduced dataset's efficacy.
arXiv Detail & Related papers (2024-10-02T14:49:05Z) - ExaRanker-Open: Synthetic Explanation for IR using Open-Source LLMs [60.81649785463651]
We introduce ExaRanker-Open, where we adapt and explore the use of open-source language models to generate explanations.
Our findings reveal that incorporating explanations consistently enhances neural rankers, with benefits escalating as the LLM size increases.
arXiv Detail & Related papers (2024-02-09T11:23:14Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - Embarassingly Simple Dataset Distillation [0.0]
We tackle dataset distillation at its core by treating it directly as a bilevel optimization problem.
A deeper dive into the nature of distilled data unveils pronounced intercorrelation.
We devise a boosting mechanism that generates distilled datasets that contain subsets with near optimal performance across different data budgets.
arXiv Detail & Related papers (2023-11-13T02:14:54Z) - Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task.
By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset.
We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z) - Evaluating the effect of data augmentation and BALD heuristics on
distillation of Semantic-KITTI dataset [63.20765930558542]
Active Learning has remained relatively unexplored for LiDAR perception tasks in autonomous driving datasets.
We evaluate Bayesian active learning methods applied to the task of dataset distillation or core subset selection.
We also study the effect of application of data augmentation within Bayesian AL based dataset distillation.
arXiv Detail & Related papers (2023-02-21T13:56:47Z) - Dataset Distillation: A Comprehensive Review [76.26276286545284]
dataset distillation (DD) aims to derive a much smaller dataset containing synthetic samples, based on which the trained models yield performance comparable with those trained on the original dataset.
This paper gives a comprehensive review and summary of recent advances in DD and its application.
arXiv Detail & Related papers (2023-01-17T17:03:28Z) - LADA: Look-Ahead Data Acquisition via Augmentation for Active Learning [24.464022706979886]
This paper proposes Look-Ahead Data Acquisition via augmentation, or LADA, to integrate data acquisition and data augmentation.
LADA considers both 1) unlabeled data instance to be selected and 2) virtual data instance to be generated by data augmentation.
The performance of LADA shows a significant improvement over the recent augmentation and acquisition baselines.
arXiv Detail & Related papers (2020-11-09T05:21:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.