A Survey of Data Augmentation Approaches for NLP
- URL: http://arxiv.org/abs/2105.03075v1
- Date: Fri, 7 May 2021 06:03:45 GMT
- Title: A Survey of Data Augmentation Approaches for NLP
- Authors: Steven Y. Feng, Varun Gangal, Jason Wei, Sarath Chandar, Soroush
Vosoughi, Teruko Mitamura, Eduard Hovy
- Abstract summary: Data augmentation has recently seen increased interest in NLP due to more work in low-resource domains, new tasks, and the popularity of large-scale neural networks.
Despite this recent upsurge, this area is still relatively underexplored, perhaps due to the challenges posed by the discrete nature of language data.
We present a comprehensive and unifying survey of data augmentation for NLP by summarizing the literature in a structured manner.
- Score: 12.606206831969262
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data augmentation has recently seen increased interest in NLP due to more
work in low-resource domains, new tasks, and the popularity of large-scale
neural networks that require large amounts of training data. Despite this
recent upsurge, this area is still relatively underexplored, perhaps due to the
challenges posed by the discrete nature of language data. In this paper, we
present a comprehensive and unifying survey of data augmentation for NLP by
summarizing the literature in a structured manner. We first introduce and
motivate data augmentation for NLP, and then discuss major methodologically
representative approaches. Next, we highlight techniques that are used for
popular NLP applications and tasks. We conclude by outlining current challenges
and directions for future research. Overall, our paper aims to clarify the
landscape of existing literature in data augmentation for NLP and motivate
additional work in this area.
Related papers
- The Nature of NLP: Analyzing Contributions in NLP Papers [77.31665252336157]
We quantitatively investigate what constitutes NLP research by examining research papers.
Our findings reveal a rising involvement of machine learning in NLP since the early nineties.
In post-2020, there has been a resurgence of focus on language and people.
arXiv Detail & Related papers (2024-09-29T01:29:28Z) - The What, Why, and How of Context Length Extension Techniques in Large
Language Models -- A Detailed Survey [6.516561905186376]
The advent of Large Language Models (LLMs) represents a notable breakthrough in Natural Language Processing (NLP)
We study the inherent challenges associated with extending context length and present an organized overview of the existing strategies employed by researchers.
We explore whether there is a consensus within the research community regarding evaluation standards and identify areas where further agreement is needed.
arXiv Detail & Related papers (2024-01-15T18:07:21Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Surveying the Landscape of Text Summarization with Deep Learning: A
Comprehensive Review [2.4185510826808487]
Deep learning has revolutionized natural language processing (NLP) by enabling the development of models that can learn complex representations of language data.
Deep learning models for NLP typically use large amounts of data to train deep neural networks, allowing them to learn the patterns and relationships in language data.
Applying deep learning to text summarization refers to the use of deep neural networks to perform text summarization tasks.
arXiv Detail & Related papers (2023-10-13T21:24:37Z) - Exploring the Landscape of Natural Language Processing Research [3.3916160303055567]
Several NLP-related approaches have been surveyed in the research community.
A comprehensive study that categorizes established topics, identifies trends, and outlines areas for future research remains absent.
As a result, we present a structured overview of the research landscape, provide a taxonomy of fields of study in NLP, analyze recent developments in NLP, summarize our findings, and highlight directions for future work.
arXiv Detail & Related papers (2023-07-20T07:33:30Z) - Efficient Methods for Natural Language Processing: A Survey [76.34572727185896]
This survey synthesizes and relates current methods and findings in efficient NLP.
We aim to provide both guidance for conducting NLP under limited resources, and point towards promising research directions for developing more efficient methods.
arXiv Detail & Related papers (2022-08-31T20:32:35Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - Meta Learning for Natural Language Processing: A Survey [88.58260839196019]
Deep learning has been the mainstream technique in natural language processing (NLP) area.
Deep learning requires many labeled data and is less generalizable across domains.
Meta-learning is an arising field in machine learning studying approaches to learn better algorithms.
arXiv Detail & Related papers (2022-05-03T13:58:38Z) - An Empirical Survey of Data Augmentation for Limited Data Learning in
NLP [88.65488361532158]
dependence on abundant data prevents NLP models from being applied to low-resource settings or novel tasks.
Data augmentation methods have been explored as a means of improving data efficiency in NLP.
We provide an empirical survey of recent progress on data augmentation for NLP in the limited labeled data setting.
arXiv Detail & Related papers (2021-06-14T15:27:22Z) - A Survey of Active Learning for Text Classification using Deep Neural
Networks [1.2310316230437004]
Natural language processing (NLP) and neural networks (NNs) have both undergone significant changes in recent years.
For active learning (AL) purposes, NNs are, however, less commonly used -- despite their current popularity.
arXiv Detail & Related papers (2020-08-17T12:53:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.