Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data
Augmentation via MiniMax
- URL: http://arxiv.org/abs/2105.13608v1
- Date: Fri, 28 May 2021 06:32:32 GMT
- Title: Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data
Augmentation via MiniMax
- Authors: Ehsan Kamalloo, Mehdi Rezagholizadeh, Peyman Passban, Ali Ghodsi
- Abstract summary: MiniMax-kNN is a sample efficient data augmentation strategy.
We exploit a semi-supervised approach based on knowledge distillation to train a model on augmented data.
- Score: 7.680863481076596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data augmentation in Natural Language Processing (NLP) often yields examples
that are less human-interpretable. Recently, leveraging kNN such that augmented
examples are retrieved from large repositories of unlabelled sentences has made
a step toward interpretable augmentation. Inspired by this paradigm, we
introduce MiniMax-kNN, a sample efficient data augmentation strategy. We
exploit a semi-supervised approach based on knowledge distillation to train a
model on augmented data. In contrast to existing kNN augmentation techniques
that blindly incorporate all samples, our method dynamically selects a subset
of augmented samples with respect to the maximum KL-divergence of the training
loss. This step aims to extract the most efficient samples to ensure our
augmented data covers regions in the input space with maximum loss value. These
maximum loss regions are shrunk in our minimization step using augmented
samples. We evaluated our technique on several text classification tasks and
demonstrated that MiniMax-kNN consistently outperforms strong baselines. Our
results show that MiniMax-kNN requires fewer augmented examples and less
computation to achieve superior performance over the state-of-the-art kNN-based
augmentation techniques.
Related papers
- DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification [55.306583814017046]
We present a novel difficulty-aware semantic augmentation (DASA) approach for speaker verification.
DASA generates diversified training samples in speaker embedding space with negligible extra computing cost.
The best result achieves a 14.6% relative reduction in EER metric on CN-Celeb evaluation set.
arXiv Detail & Related papers (2023-10-18T17:07:05Z) - Abstractive Summarization as Augmentation for Document-Level Event
Detection [0.0]
We bridge the performance gap between shallow and deep models on document-level event detection by using abstractive text summarization as an augmentation method.
We use four decoding methods for text generation, namely beam search, top-k sampling, top-p sampling, and contrastive search.
Our results show that using the document title offers 2.04% and 3.19% absolute improvement in macro F1-score for linear SVM and RoBERTa, respectively.
arXiv Detail & Related papers (2023-05-29T11:28:26Z) - Practical Knowledge Distillation: Using DNNs to Beat DNNs [8.121769391666547]
We explore data and model distillation, as well as data denoising.
These techniques improve both gradient-boosting models and a specialized DNN architecture.
For an industry end-to-end real-time ML platform with 4M production inferences per second, we develop a model-training workflow based on data sampling.
Empirical evaluation shows that the proposed combination of methods consistently improves model accuracy over prior best models across several production applications deployed worldwide.
arXiv Detail & Related papers (2023-02-23T22:53:02Z) - ScoreMix: A Scalable Augmentation Strategy for Training GANs with
Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available.
We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - Virtual Data Augmentation: A Robust and General Framework for
Fine-tuning Pre-trained Models [51.46732511844122]
Powerful pre-trained language models (PLM) can be fooled by small perturbations or intentional attacks.
We present Virtual Data Augmentation (VDA), a general framework for robustly fine-tuning PLMs.
Our approach is able to improve the robustness of PLMs and alleviate the performance degradation under adversarial attacks.
arXiv Detail & Related papers (2021-09-13T09:15:28Z) - Reweighting Augmented Samples by Minimizing the Maximal Expected Loss [51.2791895511333]
We construct the maximal expected loss which is the supremum over any reweighted loss on augmented samples.
Inspired by adversarial training, we minimize this maximal expected loss and obtain a simple and interpretable closed-form solution.
The proposed method can generally be applied on top of any data augmentation methods.
arXiv Detail & Related papers (2021-03-16T09:31:04Z) - Entropy Maximization and Meta Classification for Out-Of-Distribution
Detection in Semantic Segmentation [7.305019142196585]
"Out-of-distribution" (OoD) samples are crucial for many applications such as automated driving.
A natural baseline approach to OoD detection is to threshold on the pixel-wise softmax entropy.
We present a two-step procedure that significantly improves that approach.
arXiv Detail & Related papers (2020-12-09T11:01:06Z) - Taming GANs with Lookahead-Minmax [63.90038365274479]
Experimental results on MNIST, SVHN, CIFAR-10, and ImageNet demonstrate a clear advantage of combining Lookahead-minmax with Adam or extragradient.
Using 30-fold fewer parameters and 16-fold smaller minibatches we outperform the reported performance of the class-dependent BigGAN on CIFAR-10 by obtaining FID of 12.19 without using the class labels.
arXiv Detail & Related papers (2020-06-25T17:13:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.