The Nature of NLP: Analyzing Contributions in NLP Papers
- URL: http://arxiv.org/abs/2409.19505v2
- Date: Sun, 01 Jun 2025 23:12:08 GMT
- Title: The Nature of NLP: Analyzing Contributions in NLP Papers
- Authors: Aniket Pramanick, Yufang Hou, Saif M. Mohammad, Iryna Gurevych,
- Abstract summary: We propose a taxonomy of research contributions and introduce NLPContributions, a dataset of nearly $2k$ NLP research paper abstracts.<n>We show that NLP research has taken a winding path -- with the focus on language and human-centric studies being prominent in the 1970s and 80s, tapering off in the 1990s and 2000s, and starting to rise again since the late 2010s.<n>Our dataset and analyses offer a powerful lens for tracing research trends and offer potential for generating informed, data-driven literature surveys.
- Score: 77.31665252336157
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Natural Language Processing (NLP) is an established and dynamic field. Despite this, what constitutes NLP research remains debated. In this work, we address the question by quantitatively examining NLP research papers. We propose a taxonomy of research contributions and introduce NLPContributions, a dataset of nearly $2k$ NLP research paper abstracts, carefully annotated to identify scientific contributions and classify their types according to this taxonomy. We also introduce a novel task of automatically identifying contribution statements and classifying their types from research papers. We present experimental results for this task and apply our model to $\sim$$29k$ NLP research papers to analyze their contributions, aiding in the understanding of the nature of NLP research. We show that NLP research has taken a winding path -- with the focus on language and human-centric studies being prominent in the 1970s and 80s, tapering off in the 1990s and 2000s, and starting to rise again since the late 2010s. Alongside this revival, we observe a steady rise in dataset and methodological contributions since the 1990s, such that today, on average, individual NLP papers contribute in more ways than ever before. Our dataset and analyses offer a powerful lens for tracing research trends and offer potential for generating informed, data-driven literature surveys.
Related papers
- Towards Systematic Monolingual NLP Surveys: GenA of Greek NLP [2.3499129784547663]
This study fills the gap by introducing a method for creating systematic and comprehensive monolingual NLP surveys.
Characterized by a structured search protocol, it can be used to select publications and organize them through a taxonomy of NLP tasks.
By applying our method, we conducted a systematic literature review of Greek NLP from 2012 to 2022.
arXiv Detail & Related papers (2024-07-13T12:01:52Z) - From Insights to Actions: The Impact of Interpretability and Analysis Research on NLP [28.942812379900673]
Interpretability and analysis (IA) research is a growing subfield within NLP.
We seek to quantify the impact of IA research on the broader field of NLP.
arXiv Detail & Related papers (2024-06-18T13:45:07Z) - Culturally Aware and Adapted NLP: A Taxonomy and a Survey of the State of the Art [70.1063219524999]
The surge of interest in culturally aware and adapted Natural Language Processing has inspired much recent research.
The lack of common understanding of the concept of "culture" has made it difficult to evaluate progress in this emerging area.
We propose an extensive taxonomy of elements of culture that can provide a systematic framework for analyzing and understanding research progress.
arXiv Detail & Related papers (2024-06-06T10:16:43Z) - What Can Natural Language Processing Do for Peer Review? [173.8912784451817]
In modern science, peer review is widely used, yet it is hard, time-consuming, and prone to error.
Since the artifacts involved in peer review are largely text-based, Natural Language Processing has great potential to improve reviewing.
We detail each step of the process from manuscript submission to camera-ready revision, and discuss the associated challenges and opportunities for NLP assistance.
arXiv Detail & Related papers (2024-05-10T16:06:43Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Ling-CL: Understanding NLP Models through Linguistic Curricula [17.44112549879293]
We employ a characterization of linguistic complexity from psycholinguistic and language acquisition research.
We develop data-driven curricula to understand the underlying linguistic knowledge that models learn to address NLP tasks.
arXiv Detail & Related papers (2023-10-31T01:44:33Z) - To Build Our Future, We Must Know Our Past: Contextualizing Paradigm
Shifts in Natural Language Processing [14.15370310437262]
We study factors that shape NLP as a field, including culture, incentives, and infrastructure.
Our interviewees identify cyclical patterns in the field, as well as new shifts without historical parallel.
We conclude by discussing shared visions, concerns, and hopes for the future of NLP.
arXiv Detail & Related papers (2023-10-11T17:59:36Z) - Exploring the Landscape of Natural Language Processing Research [3.3916160303055567]
Several NLP-related approaches have been surveyed in the research community.
A comprehensive study that categorizes established topics, identifies trends, and outlines areas for future research remains absent.
As a result, we present a structured overview of the research landscape, provide a taxonomy of fields of study in NLP, analyze recent developments in NLP, summarize our findings, and highlight directions for future work.
arXiv Detail & Related papers (2023-07-20T07:33:30Z) - A Diachronic Analysis of Paradigm Shifts in NLP Research: When, How, and
Why? [84.46288849132634]
We propose a systematic framework for analyzing the evolution of research topics in a scientific field using causal discovery and inference techniques.
We define three variables to encompass diverse facets of the evolution of research topics within NLP.
We utilize a causal discovery algorithm to unveil the causal connections among these variables using observational data.
arXiv Detail & Related papers (2023-05-22T11:08:00Z) - Beyond Good Intentions: Reporting the Research Landscape of NLP for
Social Good [115.1507728564964]
We introduce NLP4SG Papers, a scientific dataset with three associated tasks.
These tasks help identify NLP4SG papers and characterize the NLP4SG landscape.
We use state-of-the-art NLP models to address each of these tasks and use them on the entire ACL Anthology.
arXiv Detail & Related papers (2023-05-09T14:16:25Z) - Application of Transformers based methods in Electronic Medical Records:
A Systematic Literature Review [77.34726150561087]
This work presents a systematic literature review of state-of-the-art advances using transformer-based methods on electronic medical records (EMRs) in different NLP tasks.
arXiv Detail & Related papers (2023-04-05T22:19:42Z) - An Inclusive Notion of Text [69.36678873492373]
We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP.
We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
arXiv Detail & Related papers (2022-11-10T14:26:43Z) - State-of-the-art generalisation research in NLP: A taxonomy and review [87.1541712509283]
We present a taxonomy for characterising and understanding generalisation research in NLP.
Our taxonomy is based on an extensive literature review of generalisation research.
We use our taxonomy to classify over 400 papers that test generalisation.
arXiv Detail & Related papers (2022-10-06T16:53:33Z) - A Decade of Knowledge Graphs in Natural Language Processing: A Survey [3.3358633215849927]
Knowledge graphs (KGs) have attracted a surge of interest from both academia and industry.
As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing.
arXiv Detail & Related papers (2022-09-30T21:53:57Z) - Reproducibility Beyond the Research Community: Experience from NLP
Beginners [6.957948096979098]
We conducted a study with 93 students in an introductory NLP course, where students reproduced results of recent NLP papers.
Surprisingly, our results suggest that their technical skill (i.e., programming experience) has limited impact on their effort spent completing the exercise.
We find accessibility efforts by research authors to be key to a successful experience, including thorough documentation and easy access to required models and datasets.
arXiv Detail & Related papers (2022-05-04T16:54:00Z) - Meta Learning for Natural Language Processing: A Survey [88.58260839196019]
Deep learning has been the mainstream technique in natural language processing (NLP) area.
Deep learning requires many labeled data and is less generalizable across domains.
Meta-learning is an arising field in machine learning studying approaches to learn better algorithms.
arXiv Detail & Related papers (2022-05-03T13:58:38Z) - An Empirical Survey of Data Augmentation for Limited Data Learning in
NLP [88.65488361532158]
dependence on abundant data prevents NLP models from being applied to low-resource settings or novel tasks.
Data augmentation methods have been explored as a means of improving data efficiency in NLP.
We provide an empirical survey of recent progress on data augmentation for NLP in the limited labeled data setting.
arXiv Detail & Related papers (2021-06-14T15:27:22Z) - A Survey of Data Augmentation Approaches for NLP [12.606206831969262]
Data augmentation has recently seen increased interest in NLP due to more work in low-resource domains, new tasks, and the popularity of large-scale neural networks.
Despite this recent upsurge, this area is still relatively underexplored, perhaps due to the challenges posed by the discrete nature of language data.
We present a comprehensive and unifying survey of data augmentation for NLP by summarizing the literature in a structured manner.
arXiv Detail & Related papers (2021-05-07T06:03:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.