LADA: Look-Ahead Data Acquisition via Augmentation for Active Learning
- URL: http://arxiv.org/abs/2011.04194v3
- Date: Tue, 17 Nov 2020 02:41:16 GMT
- Title: LADA: Look-Ahead Data Acquisition via Augmentation for Active Learning
- Authors: Yoon-Yeong Kim, Kyungwoo Song, JoonHo Jang, Il-Chul Moon
- Abstract summary: This paper proposes Look-Ahead Data Acquisition via augmentation, or LADA, to integrate data acquisition and data augmentation.
LADA considers both 1) unlabeled data instance to be selected and 2) virtual data instance to be generated by data augmentation.
The performance of LADA shows a significant improvement over the recent augmentation and acquisition baselines.
- Score: 24.464022706979886
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Active learning effectively collects data instances for training deep
learning models when the labeled dataset is limited and the annotation cost is
high. Besides active learning, data augmentation is also an effective technique
to enlarge the limited amount of labeled instances. However, the potential gain
from virtual instances generated by data augmentation has not been considered
in the acquisition process of active learning yet. Looking ahead the effect of
data augmentation in the process of acquisition would select and generate the
data instances that are informative for training the model. Hence, this paper
proposes Look-Ahead Data Acquisition via augmentation, or LADA, to integrate
data acquisition and data augmentation. LADA considers both 1) unlabeled data
instance to be selected and 2) virtual data instance to be generated by data
augmentation, in advance of the acquisition process. Moreover, to enhance the
informativeness of the virtual data instances, LADA optimizes the data
augmentation policy to maximize the predictive acquisition score, resulting in
the proposal of InfoMixup and InfoSTN. As LADA is a generalizable framework, we
experiment with the various combinations of acquisition and augmentation
methods. The performance of LADA shows a significant improvement over the
recent augmentation and acquisition baselines which were independently applied
to the benchmark datasets.
Related papers
- Learn from the Learnt: Source-Free Active Domain Adaptation via Contrastive Sampling and Visual Persistence [60.37934652213881]
Domain Adaptation (DA) facilitates knowledge transfer from a source domain to a related target domain.
This paper investigates a practical DA paradigm, namely Source data-Free Active Domain Adaptation (SFADA), where source data becomes inaccessible during adaptation.
We present learn from the learnt (LFTL), a novel paradigm for SFADA to leverage the learnt knowledge from the source pretrained model and actively iterated models without extra overhead.
arXiv Detail & Related papers (2024-07-26T17:51:58Z) - A Comprehensive Survey on Data Augmentation [55.355273602421384]
Data augmentation is a technique that generates high-quality artificial data by manipulating existing data samples.
Existing literature surveys only focus on a certain type of specific modality data.
We propose a more enlightening taxonomy that encompasses data augmentation techniques for different common data modalities.
arXiv Detail & Related papers (2024-05-15T11:58:08Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - A Pre-trained Data Deduplication Model based on Active Learning [13.495903601474819]
"dirty data" problems can significantly limit the effective application of big data.
We propose a pre-trained deduplication model based on active learning.
Our proposed model outperforms previous state-of-the-art (SOTA) for deduplicated data identification.
arXiv Detail & Related papers (2023-07-31T03:56:46Z) - Towards Adaptable and Interactive Image Captioning with Data
Augmentation and Episodic Memory [8.584932159968002]
We present an IML pipeline for image captioning which allows us to incrementally adapt a pre-trained model to a new data distribution based on user input.
We find that, while data augmentation worsens results, even when relatively small amounts of data are available, episodic memory is an effective strategy to retain knowledge from previously seen clusters.
arXiv Detail & Related papers (2023-06-06T08:38:10Z) - LiDAR dataset distillation within bayesian active learning framework:
Understanding the effect of data augmentation [63.20765930558542]
Active learning (AL) has re-gained attention recently to address reduction of annotation costs and dataset size.
This paper performs a principled evaluation of AL based dataset distillation on (1/4th) of the large Semantic-KITTI dataset.
We observe that data augmentation achieves full dataset accuracy using only 60% of samples from the selected dataset configuration.
arXiv Detail & Related papers (2022-02-06T00:04:21Z) - Data Shapley Valuation for Efficient Batch Active Learning [21.76249748709411]
Active Data Shapley (ADS) is a filtering layer for batch active learning.
We show that ADS is particularly effective when the pool of unlabeled data exhibits real-world caveats.
arXiv Detail & Related papers (2021-04-16T18:53:42Z) - Adaptive Weighting Scheme for Automatic Time-Series Data Augmentation [79.47771259100674]
We present two sample-adaptive automatic weighting schemes for data augmentation.
We validate our proposed methods on a large, noisy financial dataset and on time-series datasets from the UCR archive.
On the financial dataset, we show that the methods in combination with a trading strategy lead to improvements in annualized returns of over 50$%$, and on the time-series data we outperform state-of-the-art models on over half of the datasets, and achieve similar performance in accuracy on the others.
arXiv Detail & Related papers (2021-02-16T17:50:51Z) - Improving the Performance of Fine-Grain Image Classifiers via Generative
Data Augmentation [0.5161531917413706]
We develop Data Augmentation from Proficient Pre-Training of Robust Generative Adrial Networks (DAPPER GAN)
DAPPER GAN is an ML analytics support tool that automatically generates novel views of training images.
We experimentally evaluate this technique on the Stanford Cars dataset, demonstrating improved vehicle make and model classification accuracy.
arXiv Detail & Related papers (2020-08-12T15:29:11Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.