MixPro: Simple yet Effective Data Augmentation for Prompt-based Learning
- URL: http://arxiv.org/abs/2304.09402v2
- Date: Sat, 11 Nov 2023 15:15:26 GMT
- Title: MixPro: Simple yet Effective Data Augmentation for Prompt-based Learning
- Authors: Bohan Li, Longxu Dou, Yutai Hou, Yunlong Feng, Honglin Mu, Qingfu Zhu,
Qinghua Sun, Wanxiang Che
- Abstract summary: We introduce MixPro, an augmentation method designed to augment both the vanilla input text and the templates.
Experiments show that MixPro outperforms other augmentation baselines, improving model performance by an average of 5.08%.
- Score: 53.185180119904174
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompt-based learning has shown considerable promise in reformulating various
downstream tasks as cloze problems by combining original input with a
predetermined template. This approach demonstrates its effectiveness,
especially in few-shot learning scenarios, where the model is trained on a
scarce amount of data. Despite its successes, the limited templates and text in
few-shot prompt-based learning scenarios leave significant room for performance
improvement. Moreover, existing methods sometimes resort to model ensembles,
which, while effective, could potentially hamper model efficiency due to
increased computational demands. To address these issues, we introduce MixPro,
an augmentation method designed to augment both the vanilla input text and the
templates. We implement this through the token-level, the sentence-level, and
the template-level Mixup strategies. The experimental results on five few-shot
datasets show that MixPro outperforms other augmentation baselines, improving
model performance by an average of 5.08% compared to before augmentation.
Related papers
- Analyzing Persuasive Strategies in Meme Texts: A Fusion of Language Models with Paraphrase Enrichment [0.23020018305241333]
This paper describes our approach to hierarchical multi-label detection of persuasion techniques in meme texts.
The scope of the study encompasses enhancing model performance through innovative training techniques and data augmentation strategies.
arXiv Detail & Related papers (2024-07-01T20:25:20Z) - Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance [55.872926690722714]
We study the predictability of model performance regarding the mixture proportions in function forms.
We propose nested use of the scaling laws of training steps, model sizes, and our data mixing law.
Our method effectively optimize the training mixture of a 1B model trained for 100B tokens in RedPajama.
arXiv Detail & Related papers (2024-03-25T17:14:00Z) - Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements [10.687101698324897]
Large language models demonstrate a remarkable capability for learning to solve new tasks from a few examples.
The prompt template, or the way the input examples are formatted to obtain the prompt, is an important yet often overlooked aspect of in-context learning.
We show that a poor choice of the template can reduce the performance of the strongest models and inference methods to a random guess level.
arXiv Detail & Related papers (2024-01-12T18:58:26Z) - Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot
Text Classification Tasks [75.42002070547267]
We propose a self evolution learning (SE) based mixup approach for data augmentation in text classification.
We introduce a novel instance specific label smoothing approach, which linearly interpolates the model's output and one hot labels of the original samples to generate new soft for label mixing up.
arXiv Detail & Related papers (2023-05-22T23:43:23Z) - Model ensemble instead of prompt fusion: a sample-specific knowledge
transfer method for few-shot prompt tuning [85.55727213502402]
We focus on improving the few-shot performance of prompt tuning by transferring knowledge from soft prompts of source tasks.
We propose Sample-specific Ensemble of Source Models (SESoM)
SESoM learns to adjust the contribution of each source model for each target sample separately when ensembling source model outputs.
arXiv Detail & Related papers (2022-10-23T01:33:16Z) - DoubleMix: Simple Interpolation-Based Data Augmentation for Text
Classification [56.817386699291305]
This paper proposes a simple yet effective data augmentation approach termed DoubleMix.
DoubleMix first generates several perturbed samples for each training data.
It then uses the perturbed data and original data to carry out a two-step in the hidden space of neural models.
arXiv Detail & Related papers (2022-09-12T15:01:04Z) - Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language
Transfer Learning [59.38343286807997]
We propose Model-Agnostic Multitask Fine-tuning (MAMF) for vision-language models on unseen tasks.
Compared with model-agnostic meta-learning (MAML), MAMF discards the bi-level optimization and uses only first-order gradients.
We show that MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets.
arXiv Detail & Related papers (2022-03-09T17:26:53Z) - Multi-stage Pre-training over Simplified Multimodal Pre-training Models [35.644196343835674]
We propose a new Multi-stage Pre-training (MSP) method, which uses information at different granularities from word, phrase to sentence in both texts and images to pre-train the model in stages.
We also design several different pre-training tasks suitable for the information granularity in different stage in order to efficiently capture the diverse knowledge from a limited corpus.
Experimental results show that our method achieves comparable performance to the original LXMERT model in all downstream tasks, and even outperforms the original model in Image-Text Retrieval task.
arXiv Detail & Related papers (2021-07-22T03:35:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.