KnowDA: All-in-One Knowledge Mixture Model for Data Augmentation in
Few-Shot NLP
- URL: http://arxiv.org/abs/2206.10265v1
- Date: Tue, 21 Jun 2022 11:34:02 GMT
- Title: KnowDA: All-in-One Knowledge Mixture Model for Data Augmentation in
Few-Shot NLP
- Authors: Yufei Wang, Jiayi Zheng, Can Xu, Xiubo Geng, Tao Shen, Chongyang Tao,
Daxin Jiang
- Abstract summary: Existing data augmentation algorithms leverage task-independent rules or fine-tune general-purpose pre-trained language models.
These methods have trivial task-specific knowledge and are limited to yielding low-quality synthetic data for weak baselines in simple tasks.
We propose the Knowledge Mixture Data Augmentation Model (KnowDA): an encoder-decoder LM pretrained on a mixture of diverse NLP tasks.
- Score: 68.43279384561352
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper focuses on text data augmentation for few-shot NLP tasks. The
existing data augmentation algorithms either leverage task-independent
heuristic rules (e.g., Synonym Replacement) or fine-tune general-purpose
pre-trained language models (e.g., GPT2) using a small training set to produce
new synthetic data. Consequently, these methods have trivial task-specific
knowledge and are limited to yielding low-quality synthetic data for weak
baselines in simple tasks. To combat this issue, we propose the Knowledge
Mixture Data Augmentation Model (KnowDA): an encoder-decoder LM pretrained on a
mixture of diverse NLP tasks using Knowledge Mixture Training (KoMT). KoMT is a
training procedure that reformulates input examples from various heterogeneous
NLP tasks into a unified text-to-text format and employs denoising objectives
in different granularity to learn to generate partial or complete samples. With
the aid of KoMT, KnowDA could combine required task-specific knowledge
implicitly from the learned mixture of tasks and quickly grasp the inherent
synthesis law of the target task through a few given instances. To the best of
our knowledge, we are the first attempt to scale the number of tasks to 100+ in
multi-task co-training for data augmentation. Extensive experiments show that
i) KnowDA successfully improves the performance of Albert and Deberta by a
large margin on the FewGLUE benchmark, outperforming previous state-of-the-art
data augmentation baselines; ii) KnowDA could also improve the model
performance on the few-shot NER tasks, a held-out task type not included in
KoMT.
Related papers
- Pre-training Multi-task Contrastive Learning Models for Scientific
Literature Understanding [52.723297744257536]
Pre-trained language models (LMs) have shown effectiveness in scientific literature understanding tasks.
We propose a multi-task contrastive learning framework, SciMult, to facilitate common knowledge sharing across different literature understanding tasks.
arXiv Detail & Related papers (2023-05-23T16:47:22Z) - Preventing Catastrophic Forgetting in Continual Learning of New Natural
Language Tasks [17.879087904904935]
Multi-Task Learning (MTL) is widely-accepted in Natural Language Processing as a standard technique for learning multiple related tasks in one model.
As systems usually evolve over time, adding a new task to an existing MTL model usually requires retraining the model from scratch on all the tasks.
In this paper, we approach the problem of incrementally expanding MTL models' capability to solve new tasks over time by distilling the knowledge of an already trained model on n tasks into a new one for solving n+1 tasks.
arXiv Detail & Related papers (2023-02-22T00:18:25Z) - MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders are
Better Dense Retrievers [140.0479479231558]
In this work, we aim to unify a variety of pre-training tasks into a multi-task pre-trained model, namely MASTER.
MASTER utilizes a shared-encoder multi-decoder architecture that can construct a representation bottleneck to compress the abundant semantic information across tasks into dense vectors.
arXiv Detail & Related papers (2022-12-15T13:57:07Z) - Instance-wise Prompt Tuning for Pretrained Language Models [72.74916121511662]
Instance-wise Prompt Tuning (IPT) is the first prompt learning paradigm that injects knowledge from the input data instances to the prompts.
IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
arXiv Detail & Related papers (2022-06-04T10:08:50Z) - Grad2Task: Improved Few-shot Text Classification Using Gradients for
Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification.
Our key idea is to represent each task using gradient information from a base model.
Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z) - Uni-Perceiver: Pre-training Unified Architecture for Generic Perception
for Zero-shot and Few-shot Tasks [73.63892022944198]
We present a generic perception architecture named Uni-Perceiver.
It processes a variety of modalities and tasks with unified modeling and shared parameters.
Results show that our pre-trained model without any tuning can achieve reasonable performance even on novel tasks.
arXiv Detail & Related papers (2021-12-02T18:59:50Z) - Pretraining Representations for Data-Efficient Reinforcement Learning [12.43475487724972]
We use unlabeled data to pretrain an encoder which is then finetuned on a small amount of task-specific data.
When limited to 100k steps of interaction on Atari games, our approach significantly surpasses prior work.
Our approach shows particular promise when combined with larger models as well as more diverse, task-aligned observational data.
arXiv Detail & Related papers (2021-06-09T04:14:27Z) - DAGA: Data Augmentation with a Generation Approach for Low-resource
Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences.
Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.