Rethink the Effectiveness of Text Data Augmentation: An Empirical
Analysis
- URL: http://arxiv.org/abs/2306.07664v1
- Date: Tue, 13 Jun 2023 10:14:58 GMT
- Title: Rethink the Effectiveness of Text Data Augmentation: An Empirical
Analysis
- Authors: Zhengxiang Shi, Aldo Lipani
- Abstract summary: We evaluate the effectiveness of three different FT methods in conjugation with back-translation across an array of 7 diverse NLP tasks.
Our findings reveal that continued pre-training on augmented data can effectively improve the FT performance of the downstream tasks.
Our finding highlights the potential of DA as a powerful tool for bolstering LMs' performance.
- Score: 4.771833920251869
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, language models (LMs) have made remarkable progress in
advancing the field of natural language processing (NLP). However, the impact
of data augmentation (DA) techniques on the fine-tuning (FT) performance of
these LMs has been a topic of ongoing debate. In this study, we evaluate the
effectiveness of three different FT methods in conjugation with
back-translation across an array of 7 diverse NLP tasks, including
classification and regression types, covering single-sentence and sentence-pair
tasks. Contrary to prior assumptions that DA does not contribute to the
enhancement of LMs' FT performance, our findings reveal that continued
pre-training on augmented data can effectively improve the FT performance of
the downstream tasks. In the most favourable case, continued pre-training
improves the performance of FT by more than 10% in the few-shot learning
setting. Our finding highlights the potential of DA as a powerful tool for
bolstering LMs' performance.
Related papers
- Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration [74.09687562334682]
We introduce a novel training data attribution method called Debias and Denoise Attribution (DDA)
Our method significantly outperforms existing approaches, achieving an averaged AUC of 91.64%.
DDA exhibits strong generality and scalability across various sources and different-scale models like LLaMA2, QWEN2, and Mistral.
arXiv Detail & Related papers (2024-10-02T07:14:26Z) - Mitigating Training Imbalance in LLM Fine-Tuning via Selective Parameter Merging [11.223074654129915]
Supervised fine-tuning (SFT) is crucial for adapting Large Language Models (LLMs) to specific tasks.
We propose to mitigate this imbalance by merging SFT models fine-tuned with different data orders.
arXiv Detail & Related papers (2024-10-01T08:44:31Z) - A Gradient Analysis Framework for Rewarding Good and Penalizing Bad Examples in Language Models [63.949883238901414]
We present a unique angle of gradient analysis of loss functions that simultaneously reward good examples and penalize bad ones in LMs.
We find that ExMATE serves as a superior surrogate for MLE, and that combining DPO with ExMATE instead of MLE further enhances both the statistical (5-7%) and generative (+18% win rate) performance.
arXiv Detail & Related papers (2024-08-29T17:46:18Z) - Fine Tuning vs. Retrieval Augmented Generation for Less Popular Knowledge [15.553942864736989]
Two approaches to enhance the performance of LMs on low-frequent topics are: Retrieval Augmented Generation (RAG) and fine-tuning (FT) over synthetic data.
This paper explores and evaluates the impact of RAG and FT on customizing LMs in handling low-frequency entities on question answering tasks.
Our findings indicate that while FT boosts the performance across entities of varying popularity, RAG surpasses FT by a large margin particularly for least popular factual knowledge.
arXiv Detail & Related papers (2024-03-03T08:07:55Z) - Prompt Perturbation Consistency Learning for Robust Language Models [47.021022978847036]
Large language models (LLMs) have demonstrated impressive performance on a number of natural language processing tasks.
We show that fine-tuning sufficiently large LLMs can produce IC-SF performance comparable to discriminative models.
We propose an efficient mitigation approach, Prompt Perturbation Consistency Learning (PPCL), which works by regularizing the divergence between losses from clean and perturbed samples.
arXiv Detail & Related papers (2024-02-24T15:00:58Z) - Order Matters in the Presence of Dataset Imbalance for Multilingual
Learning [53.74649778447903]
We present a simple yet effective method of pre-training on high-resource tasks, followed by fine-tuning on a mixture of high/low-resource tasks.
We show its improvements in neural machine translation (NMT) and multi-lingual language modeling.
arXiv Detail & Related papers (2023-12-11T05:46:57Z) - Accelerating LLaMA Inference by Enabling Intermediate Layer Decoding via
Instruction Tuning with LITE [62.13435256279566]
Large Language Models (LLMs) have achieved remarkable performance across a wide variety of natural language tasks.
However, their large size makes their inference slow and computationally expensive.
We show that it enables these layers to acquire 'good' generation ability without affecting the generation ability of the final layer.
arXiv Detail & Related papers (2023-10-28T04:07:58Z) - Data Augmentation Approaches in Natural Language Processing: A Survey [28.91744006146676]
Data augmentation (DA) alleviates data scarcity scenarios where deep learning techniques may fail.
One of the main focuses of the DA methods is to improve the diversity of training data, thereby helping the model to better generalize to unseen testing data.
We frame DA methods into three categories based on the diversity of augmented data, including paraphrasing, noising, and sampling.
arXiv Detail & Related papers (2021-10-05T07:35:32Z) - An Empirical Survey of Data Augmentation for Limited Data Learning in
NLP [88.65488361532158]
dependence on abundant data prevents NLP models from being applied to low-resource settings or novel tasks.
Data augmentation methods have been explored as a means of improving data efficiency in NLP.
We provide an empirical survey of recent progress on data augmentation for NLP in the limited labeled data setting.
arXiv Detail & Related papers (2021-06-14T15:27:22Z) - Understanding Learning Dynamics for Neural Machine Translation [53.23463279153577]
We propose to understand learning dynamics of NMT by using Loss Change Allocation (LCA)citeplan 2019-loss-change-allocation.
As LCA requires calculating the gradient on an entire dataset for each update, we instead present an approximate to put it into practice in NMT scenario.
Our simulated experiment shows that such approximate calculation is efficient and is empirically proved to deliver consistent results.
arXiv Detail & Related papers (2020-04-05T13:32:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.