A new data augmentation method for intent classification enhancement and
its application on spoken conversation datasets
- URL: http://arxiv.org/abs/2202.10137v1
- Date: Mon, 21 Feb 2022 11:36:19 GMT
- Title: A new data augmentation method for intent classification enhancement and
its application on spoken conversation datasets
- Authors: Zvi Kons, Aharon Satt, Hong-Kwang Kuo, Samuel Thomas, Boaz Carmeli,
Ron Hoory, Brian Kingsbury
- Abstract summary: We present the Nearest Neighbors Scores Improvement (NNSI) algorithm for automatic data selection and labeling.
The NNSI reduces the need for manual labeling by automatically selecting highly-ambiguous samples and labeling them with high accuracy.
We demonstrated the use of NNSI on two large-scale, real-life voice conversation systems.
- Score: 23.495743195811375
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Intent classifiers are vital to the successful operation of virtual agent
systems. This is especially so in voice activated systems where the data can be
noisy with many ambiguous directions for user intents. Before operation begins,
these classifiers are generally lacking in real-world training data. Active
learning is a common approach used to help label large amounts of collected
user input. However, this approach requires many hours of manual labeling work.
We present the Nearest Neighbors Scores Improvement (NNSI) algorithm for
automatic data selection and labeling. The NNSI reduces the need for manual
labeling by automatically selecting highly-ambiguous samples and labeling them
with high accuracy. This is done by integrating the classifier's output from a
semantically similar group of text samples. The labeled samples can then be
added to the training set to improve the accuracy of the classifier. We
demonstrated the use of NNSI on two large-scale, real-life voice conversation
systems. Evaluation of our results showed that our method was able to select
and label useful samples with high accuracy. Adding these new samples to the
training data significantly improved the classifiers and reduced error rates by
up to 10%.
Related papers
- Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - Prefer to Classify: Improving Text Classifiers via Auxiliary Preference
Learning [76.43827771613127]
In this paper, we investigate task-specific preferences between pairs of input texts as a new alternative way for such auxiliary data annotation.
We propose a novel multi-task learning framework, called prefer-to-classify (P2C), which can enjoy the cooperative effect of learning both the given classification task and the auxiliary preferences.
arXiv Detail & Related papers (2023-06-08T04:04:47Z) - ActiveLab: Active Learning with Re-Labeling by Multiple Annotators [19.84626033109009]
ActiveLab is a method to decide what to label next in batch active learning.
It automatically estimates when it is more informative to re-label examples vs. labeling entirely new ones.
It reliably trains more accurate classifiers with far fewer annotations than a wide variety of popular active learning methods.
arXiv Detail & Related papers (2023-01-27T17:00:11Z) - Learning to Detect Noisy Labels Using Model-Based Features [16.681748918518075]
We propose Selection-Enhanced Noisy label Training (SENT)
SENT does not rely on meta learning while having the flexibility of being data-driven.
It improves performance over strong baselines under the settings of self-training and label corruption.
arXiv Detail & Related papers (2022-12-28T10:12:13Z) - Context-based Virtual Adversarial Training for Text Classification with
Noisy Labels [1.9508698179748525]
We propose context-based virtual adversarial training (ConVAT) to prevent a text classifier from overfitting to noisy labels.
Unlike the previous works, the proposed method performs the adversarial training at the context level rather than the inputs.
We conduct extensive experiments on four text classification datasets with two types of label noises.
arXiv Detail & Related papers (2022-05-29T14:19:49Z) - Trustable Co-label Learning from Multiple Noisy Annotators [68.59187658490804]
Supervised deep learning depends on massive accurately annotated examples.
A typical alternative is learning from multiple noisy annotators.
This paper proposes a data-efficient approach, called emphTrustable Co-label Learning (TCL)
arXiv Detail & Related papers (2022-03-08T16:57:00Z) - Prototypical Classifier for Robust Class-Imbalanced Learning [64.96088324684683]
We propose textitPrototypical, which does not require fitting additional parameters given the embedding network.
Prototypical produces balanced and comparable predictions for all classes even though the training set is class-imbalanced.
We test our method on CIFAR-10LT, CIFAR-100LT and Webvision datasets, observing that Prototypical obtains substaintial improvements compared with state of the arts.
arXiv Detail & Related papers (2021-10-22T01:55:01Z) - A Unified Generative Adversarial Network Training via Self-Labeling and
Self-Attention [38.31735499785227]
We propose a novel GAN training scheme that can handle any level of labeling in a unified manner.
Our scheme introduces a form of artificial labeling that can incorporate manually defined labels, when available.
We evaluate our approach on CIFAR-10, STL-10 and SVHN, and show that both self-labeling and self-attention consistently improve the quality of generated data.
arXiv Detail & Related papers (2021-06-18T04:40:26Z) - Adaptive Self-training for Few-shot Neural Sequence Labeling [55.43109437200101]
We develop techniques to address the label scarcity challenge for neural sequence labeling models.
Self-training serves as an effective mechanism to learn from large amounts of unlabeled data.
meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels.
arXiv Detail & Related papers (2020-10-07T22:29:05Z) - Deep Active Learning via Open Set Recognition [0.0]
In many applications, data is easy to acquire but expensive and time-consuming to label prominent examples.
We formulate active learning as an open-set recognition problem.
Unlike current active learning methods, our algorithm can learn tasks without the need for task labels.
arXiv Detail & Related papers (2020-07-04T22:09:17Z) - Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning.
The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.