Evaluating the effect of data augmentation and BALD heuristics on
distillation of Semantic-KITTI dataset
- URL: http://arxiv.org/abs/2302.10679v1
- Date: Tue, 21 Feb 2023 13:56:47 GMT
- Title: Evaluating the effect of data augmentation and BALD heuristics on
distillation of Semantic-KITTI dataset
- Authors: Anh Duong, Alexandre Almin, L\'eo Lemari\'e, B Ravi Kiran
- Abstract summary: Active Learning has remained relatively unexplored for LiDAR perception tasks in autonomous driving datasets.
We evaluate Bayesian active learning methods applied to the task of dataset distillation or core subset selection.
We also study the effect of application of data augmentation within Bayesian AL based dataset distillation.
- Score: 63.20765930558542
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Active Learning (AL) has remained relatively unexplored for LiDAR perception
tasks in autonomous driving datasets. In this study we evaluate Bayesian active
learning methods applied to the task of dataset distillation or core subset
selection (subset with near equivalent performance as full dataset). We also
study the effect of application of data augmentation (DA) within Bayesian AL
based dataset distillation. We perform these experiments on the full
Semantic-KITTI dataset. We extend our study over our existing work only on
1/4th of the same dataset. Addition of DA and BALD have a negative impact over
the labeling efficiency and thus the capacity to distill datasets. We
demonstrate key issues in designing a functional AL framework and finally
conclude with a review of challenges in real world active learning.
Related papers
- Practical Dataset Distillation Based on Deep Support Vectors [27.16222034423108]
In this paper, we focus on dataset distillation in practical scenarios with access to only a fraction of the entire dataset.
We introduce a novel distillation method that augments the conventional process by incorporating general model knowledge via the addition of Deep KKT (DKKT) loss.
In practical settings, our approach showed improved performance compared to the baseline distribution matching distillation method on the CIFAR-10 dataset.
arXiv Detail & Related papers (2024-05-01T06:41:27Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - BAL: Balancing Diversity and Novelty for Active Learning [53.289700543331925]
We introduce a novel framework, Balancing Active Learning (BAL), which constructs adaptive sub-pools to balance diverse and uncertain data.
Our approach outperforms all established active learning methods on widely recognized benchmarks by 1.20%.
arXiv Detail & Related papers (2023-12-26T08:14:46Z) - Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task.
By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset.
We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z) - A Comprehensive Study on Dataset Distillation: Performance, Privacy,
Robustness and Fairness [8.432686179800543]
We conduct extensive experiments to evaluate current state-of-the-art dataset distillation methods.
We successfully use membership inference attacks to show that privacy risks still remain.
This work offers a large-scale benchmarking framework for dataset distillation evaluation.
arXiv Detail & Related papers (2023-05-05T08:19:27Z) - LiDAR dataset distillation within bayesian active learning framework:
Understanding the effect of data augmentation [63.20765930558542]
Active learning (AL) has re-gained attention recently to address reduction of annotation costs and dataset size.
This paper performs a principled evaluation of AL based dataset distillation on (1/4th) of the large Semantic-KITTI dataset.
We observe that data augmentation achieves full dataset accuracy using only 60% of samples from the selected dataset configuration.
arXiv Detail & Related papers (2022-02-06T00:04:21Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.