Few-Shot Text Classification with Triplet Networks, Data Augmentation,
and Curriculum Learning
- URL: http://arxiv.org/abs/2103.07552v1
- Date: Fri, 12 Mar 2021 22:07:35 GMT
- Title: Few-Shot Text Classification with Triplet Networks, Data Augmentation,
and Curriculum Learning
- Authors: Jason Wei, Chengyu Huang, Soroush Vosoughi, Yu Cheng, Shiqi Xu
- Abstract summary: Few-shot text classification is a fundamental NLP task in which a model aims to classify text into a large number of categories.
This paper explores data augmentation -- a technique particularly suitable for training with limited data.
We find that common data augmentation techniques can improve the performance of triplet networks by up to 3.0% on average.
- Score: 11.66053357388062
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Few-shot text classification is a fundamental NLP task in which a model aims
to classify text into a large number of categories, given only a few training
examples per category. This paper explores data augmentation -- a technique
particularly suitable for training with limited data -- for this few-shot,
highly-multiclass text classification setting. On four diverse text
classification tasks, we find that common data augmentation techniques can
improve the performance of triplet networks by up to 3.0% on average.
To further boost performance, we present a simple training strategy called
curriculum data augmentation, which leverages curriculum learning by first
training on only original examples and then introducing augmented data as
training progresses. We explore a two-stage and a gradual schedule, and find
that, compared with standard single-stage training, curriculum data
augmentation trains faster, improves performance, and remains robust to high
amounts of noising from augmentation.
Related papers
- Open-Vocabulary Temporal Action Localization using Multimodal Guidance [67.09635853019005]
OVTAL enables a model to recognize any desired action category in videos without the need to explicitly curate training data for all categories.
This flexibility poses significant challenges, as the model must recognize not only the action categories seen during training but also novel categories specified at inference.
We introduce OVFormer, a novel open-vocabulary framework extending ActionFormer with three key contributions.
arXiv Detail & Related papers (2024-06-21T18:00:05Z) - Pointer-Guided Pre-Training: Infusing Large Language Models with Paragraph-Level Contextual Awareness [3.2925222641796554]
"pointer-guided segment ordering" (SO) is a novel pre-training technique aimed at enhancing the contextual understanding of paragraph-level text representations.
Our experiments show that pointer-guided pre-training significantly enhances the model's ability to understand complex document structures.
arXiv Detail & Related papers (2024-06-06T15:17:51Z) - Text generation for dataset augmentation in security classification
tasks [55.70844429868403]
This study evaluates the application of natural language text generators to fill this data gap in multiple security-related text classification tasks.
We find substantial benefits for GPT-3 data augmentation strategies in situations with severe limitations on known positive-class samples.
arXiv Detail & Related papers (2023-10-22T22:25:14Z) - WC-SBERT: Zero-Shot Text Classification via SBERT with Self-Training for
Wikipedia Categories [5.652290685410878]
Our research focuses on solving the zero-shot text classification problem in NLP.
We propose a novel self-training strategy that uses labels rather than text for training.
Our method achieves state-of-the-art results on both the Yahoo Topic and AG News datasets.
arXiv Detail & Related papers (2023-07-28T04:17:41Z) - Prefer to Classify: Improving Text Classifiers via Auxiliary Preference
Learning [76.43827771613127]
In this paper, we investigate task-specific preferences between pairs of input texts as a new alternative way for such auxiliary data annotation.
We propose a novel multi-task learning framework, called prefer-to-classify (P2C), which can enjoy the cooperative effect of learning both the given classification task and the auxiliary preferences.
arXiv Detail & Related papers (2023-06-08T04:04:47Z) - Boosting Visual-Language Models by Exploiting Hard Samples [126.35125029639168]
HELIP is a cost-effective strategy tailored to enhance the performance of existing CLIP models.
Our method allows for effortless integration with existing models' training pipelines.
On comprehensive benchmarks, HELIP consistently boosts existing models to achieve leading performance.
arXiv Detail & Related papers (2023-05-09T07:00:17Z) - Teacher Guided Training: An Efficient Framework for Knowledge Transfer [86.6784627427194]
We propose the teacher-guided training (TGT) framework for training a high-quality compact model.
TGT exploits the fact that the teacher has acquired a good representation of the underlying data domain.
We find that TGT can improve accuracy on several image classification benchmarks and a range of text classification and retrieval tasks.
arXiv Detail & Related papers (2022-08-14T10:33:58Z) - Guiding Generative Language Models for Data Augmentation in Few-Shot
Text Classification [59.698811329287174]
We leverage GPT-2 for generating artificial training instances in order to improve classification performance.
Our results show that fine-tuning GPT-2 in a handful of label instances leads to consistent classification improvements.
arXiv Detail & Related papers (2021-11-17T12:10:03Z) - ProtoDA: Efficient Transfer Learning for Few-Shot Intent Classification [21.933876113300897]
We adopt an alternative approach by transfer learning on an ensemble of related tasks using prototypical networks under the meta-learning paradigm.
Using intent classification as a case study, we demonstrate that increasing variability in training tasks can significantly improve classification performance.
arXiv Detail & Related papers (2021-01-28T00:19:13Z) - Adapting Deep Learning for Sentiment Classification of Code-Switched
Informal Short Text [1.6752182911522517]
We present a labeled dataset called MultiSenti for sentiment classification of code-switched informal short text.
We propose a deep learning-based model for sentiment classification of code-switched informal short text.
arXiv Detail & Related papers (2020-01-04T06:31:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.