Related papers: A Survey of Data Augmentation Approaches for NLP

A Survey of Data Augmentation Approaches for NLP

URL: http://arxiv.org/abs/2105.03075v1
Date: Fri, 7 May 2021 06:03:45 GMT
Title: A Survey of Data Augmentation Approaches for NLP
Authors: Steven Y. Feng, Varun Gangal, Jason Wei, Sarath Chandar, Soroush Vosoughi, Teruko Mitamura, Eduard Hovy
Abstract summary: Data augmentation has recently seen increased interest in NLP due to more work in low-resource domains, new tasks, and the popularity of large-scale neural networks. Despite this recent upsurge, this area is still relatively underexplored, perhaps due to the challenges posed by the discrete nature of language data. We present a comprehensive and unifying survey of data augmentation for NLP by summarizing the literature in a structured manner.
Score: 12.606206831969262
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data augmentation has recently seen increased interest in NLP due to more work in low-resource domains, new tasks, and the popularity of large-scale neural networks that require large amounts of training data. Despite this recent upsurge, this area is still relatively underexplored, perhaps due to the challenges posed by the discrete nature of language data. In this paper, we present a comprehensive and unifying survey of data augmentation for NLP by summarizing the literature in a structured manner. We first introduce and motivate data augmentation for NLP, and then discuss major methodologically representative approaches. Next, we highlight techniques that are used for popular NLP applications and tasks. We conclude by outlining current challenges and directions for future research. Overall, our paper aims to clarify the landscape of existing literature in data augmentation for NLP and motivate additional work in this area.

Related papers

Natural Language Processing for Human Resources: A Survey [7.234532661418072]
Advances in Natural Language Processing have the potential to transform HR processes. This paper discovers opportunities for researchers and practitioners to harness NLP's transformative potential in this domain.
arXiv Detail & Related papers (2024-10-21T20:41:00Z)
The Nature of NLP: Analyzing Contributions in NLP Papers [77.31665252336157]
We quantitatively investigate what constitutes NLP research by examining research papers. Our findings reveal a rising involvement of machine learning in NLP since the early nineties. In post-2020, there has been a resurgence of focus on language and people.
arXiv Detail & Related papers (2024-09-29T01:29:28Z)
A Survey on Data Selection for Language Models [148.300726396877]
Data selection methods aim to determine which data points to include in a training dataset. Deep learning is mostly driven by empirical evidence and experimentation on large-scale data is expensive. Few organizations have the resources for extensive data selection research.
arXiv Detail & Related papers (2024-02-26T18:54:35Z)
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey [6.516561905186376]
The advent of Large Language Models (LLMs) represents a notable breakthrough in Natural Language Processing (NLP) We study the inherent challenges associated with extending context length and present an organized overview of the existing strategies employed by researchers. We explore whether there is a consensus within the research community regarding evaluation standards and identify areas where further agreement is needed.
arXiv Detail & Related papers (2024-01-15T18:07:21Z)
Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets. This survey delves into an important attribute of these datasets: the dialect of a language. Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z)
Surveying the Landscape of Text Summarization with Deep Learning: A Comprehensive Review [2.4185510826808487]
Deep learning has revolutionized natural language processing (NLP) by enabling the development of models that can learn complex representations of language data. Deep learning models for NLP typically use large amounts of data to train deep neural networks, allowing them to learn the patterns and relationships in language data. Applying deep learning to text summarization refers to the use of deep neural networks to perform text summarization tasks.
arXiv Detail & Related papers (2023-10-13T21:24:37Z)
Exploring the Landscape of Natural Language Processing Research [3.3916160303055567]
Several NLP-related approaches have been surveyed in the research community. A comprehensive study that categorizes established topics, identifies trends, and outlines areas for future research remains absent. As a result, we present a structured overview of the research landscape, provide a taxonomy of fields of study in NLP, analyze recent developments in NLP, summarize our findings, and highlight directions for future work.
arXiv Detail & Related papers (2023-07-20T07:33:30Z)
Efficient Methods for Natural Language Processing: A Survey [76.34572727185896]
This survey synthesizes and relates current methods and findings in efficient NLP. We aim to provide both guidance for conducting NLP under limited resources, and point towards promising research directions for developing more efficient methods.
arXiv Detail & Related papers (2022-08-31T20:32:35Z)
Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature. We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z)
Meta Learning for Natural Language Processing: A Survey [88.58260839196019]
Deep learning has been the mainstream technique in natural language processing (NLP) area. Deep learning requires many labeled data and is less generalizable across domains. Meta-learning is an arising field in machine learning studying approaches to learn better algorithms.
arXiv Detail & Related papers (2022-05-03T13:58:38Z)
An Empirical Survey of Data Augmentation for Limited Data Learning in NLP [88.65488361532158]
dependence on abundant data prevents NLP models from being applied to low-resource settings or novel tasks. Data augmentation methods have been explored as a means of improving data efficiency in NLP. We provide an empirical survey of recent progress on data augmentation for NLP in the limited labeled data setting.
arXiv Detail & Related papers (2021-06-14T15:27:22Z)
A Survey of Active Learning for Text Classification using Deep Neural Networks [1.2310316230437004]
Natural language processing (NLP) and neural networks (NNs) have both undergone significant changes in recent years. For active learning (AL) purposes, NNs are, however, less commonly used -- despite their current popularity.
arXiv Detail & Related papers (2020-08-17T12:53:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.