Related papers: Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis

Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis

URL: http://arxiv.org/abs/2306.07664v1
Date: Tue, 13 Jun 2023 10:14:58 GMT
Title: Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis
Authors: Zhengxiang Shi, Aldo Lipani
Abstract summary: We evaluate the effectiveness of three different FT methods in conjugation with back-translation across an array of 7 diverse NLP tasks. Our findings reveal that continued pre-training on augmented data can effectively improve the FT performance of the downstream tasks. Our finding highlights the potential of DA as a powerful tool for bolstering LMs' performance.
Score: 4.771833920251869
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, language models (LMs) have made remarkable progress in advancing the field of natural language processing (NLP). However, the impact of data augmentation (DA) techniques on the fine-tuning (FT) performance of these LMs has been a topic of ongoing debate. In this study, we evaluate the effectiveness of three different FT methods in conjugation with back-translation across an array of 7 diverse NLP tasks, including classification and regression types, covering single-sentence and sentence-pair tasks. Contrary to prior assumptions that DA does not contribute to the enhancement of LMs' FT performance, our findings reveal that continued pre-training on augmented data can effectively improve the FT performance of the downstream tasks. In the most favourable case, continued pre-training improves the performance of FT by more than 10% in the few-shot learning setting. Our finding highlights the potential of DA as a powerful tool for bolstering LMs' performance.

Related papers

Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training [66.48331530995786]
We propose syMmetry-ENhanceD (MEND) Data Augmentation, a data-centric approach that improves the model's ability to extract useful information from context. Unlike existing methods that emphasize reasoning chain augmentation, our approach improves model robustness at the knowledge extraction stage. Experiments on both logical and arithmetic reasoning tasks show that MEND enhances reasoning performance across diverse query variations.
arXiv Detail & Related papers (2025-02-25T03:03:35Z)
Scaling Laws for Predicting Downstream Performance in LLMs [75.28559015477137]
This work focuses on the pre-training loss as a more computation-efficient metric for performance estimation. We present FLP-M, a fundamental approach for performance prediction that addresses the practical need to integrate datasets from multiple sources during pre-training.
arXiv Detail & Related papers (2024-10-11T04:57:48Z)
Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration [74.09687562334682]
We introduce a novel training data attribution method called Debias and Denoise Attribution (DDA) Our method significantly outperforms existing approaches, achieving an averaged AUC of 91.64%. DDA exhibits strong generality and scalability across various sources and different-scale models like LLaMA2, QWEN2, and Mistral.
arXiv Detail & Related papers (2024-10-02T07:14:26Z)
Mitigating Training Imbalance in LLM Fine-Tuning via Selective Parameter Merging [11.223074654129915]
Supervised fine-tuning (SFT) is crucial for adapting Large Language Models (LLMs) to specific tasks. We propose to mitigate this imbalance by merging SFT models fine-tuned with different data orders.
arXiv Detail & Related papers (2024-10-01T08:44:31Z)
A Gradient Analysis Framework for Rewarding Good and Penalizing Bad Examples in Language Models [63.949883238901414]
We present a unique angle of gradient analysis of loss functions that simultaneously reward good examples and penalize bad ones in LMs. We find that ExMATE serves as a superior surrogate for MLE, and that combining DPO with ExMATE instead of MLE further enhances both the statistical (5-7%) and generative (+18% win rate) performance.
arXiv Detail & Related papers (2024-08-29T17:46:18Z)
Fine Tuning vs. Retrieval Augmented Generation for Less Popular Knowledge [15.553942864736989]
Two approaches to enhance the performance of LMs on low-frequent topics are: Retrieval Augmented Generation (RAG) and fine-tuning (FT) over synthetic data. This paper explores and evaluates the impact of RAG and FT on customizing LMs in handling low-frequency entities on question answering tasks. Our findings indicate that while FT boosts the performance across entities of varying popularity, RAG surpasses FT by a large margin particularly for least popular factual knowledge.
arXiv Detail & Related papers (2024-03-03T08:07:55Z)
Prompt Perturbation Consistency Learning for Robust Language Models [47.021022978847036]
Large language models (LLMs) have demonstrated impressive performance on a number of natural language processing tasks. We show that fine-tuning sufficiently large LLMs can produce IC-SF performance comparable to discriminative models. We propose an efficient mitigation approach, Prompt Perturbation Consistency Learning (PPCL), which works by regularizing the divergence between losses from clean and perturbed samples.
arXiv Detail & Related papers (2024-02-24T15:00:58Z)
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning [53.74649778447903]
We present a simple yet effective method of pre-training on high-resource tasks, followed by fine-tuning on a mixture of high/low-resource tasks. We show its improvements in neural machine translation (NMT) and multi-lingual language modeling.
arXiv Detail & Related papers (2023-12-11T05:46:57Z)
Accelerating LLaMA Inference by Enabling Intermediate Layer Decoding via Instruction Tuning with LITE [62.13435256279566]
Large Language Models (LLMs) have achieved remarkable performance across a wide variety of natural language tasks. However, their large size makes their inference slow and computationally expensive. We show that it enables these layers to acquire 'good' generation ability without affecting the generation ability of the final layer.
arXiv Detail & Related papers (2023-10-28T04:07:58Z)
Data Augmentation Approaches in Natural Language Processing: A Survey [28.91744006146676]
Data augmentation (DA) alleviates data scarcity scenarios where deep learning techniques may fail. One of the main focuses of the DA methods is to improve the diversity of training data, thereby helping the model to better generalize to unseen testing data. We frame DA methods into three categories based on the diversity of augmented data, including paraphrasing, noising, and sampling.
arXiv Detail & Related papers (2021-10-05T07:35:32Z)
An Empirical Survey of Data Augmentation for Limited Data Learning in NLP [88.65488361532158]
dependence on abundant data prevents NLP models from being applied to low-resource settings or novel tasks. Data augmentation methods have been explored as a means of improving data efficiency in NLP. We provide an empirical survey of recent progress on data augmentation for NLP in the limited labeled data setting.
arXiv Detail & Related papers (2021-06-14T15:27:22Z)
Understanding Learning Dynamics for Neural Machine Translation [53.23463279153577]
We propose to understand learning dynamics of NMT by using Loss Change Allocation (LCA)citeplan 2019-loss-change-allocation. As LCA requires calculating the gradient on an entire dataset for each update, we instead present an approximate to put it into practice in NMT scenario. Our simulated experiment shows that such approximate calculation is efficient and is empirically proved to deliver consistent results.
arXiv Detail & Related papers (2020-04-05T13:32:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.