We are Who We Cite: Bridges of Influence Between Natural Language Processing and Other Academic Fields
- URL: http://arxiv.org/abs/2310.14870v3
- Date: Tue, 16 Jul 2024 08:50:11 GMT
- Title: We are Who We Cite: Bridges of Influence Between Natural Language Processing and Other Academic Fields
- Authors: Jan Philip Wahle, Terry Ruas, Mohamed Abdalla, Bela Gipp, Saif M. Mohammad,
- Abstract summary: Cross-field engagement of Natural Language Processing has declined.
Less than 8% of NLP citations are to linguistics.
Less than 3% of NLP citations are to math and psychology.
- Score: 30.550895983110806
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Natural Language Processing (NLP) is poised to substantially influence the world. However, significant progress comes hand-in-hand with substantial risks. Addressing them requires broad engagement with various fields of study. Yet, little empirical work examines the state of such engagement (past or current). In this paper, we quantify the degree of influence between 23 fields of study and NLP (on each other). We analyzed ~77k NLP papers, ~3.1m citations from NLP papers to other papers, and ~1.8m citations from other papers to NLP papers. We show that, unlike most fields, the cross-field engagement of NLP, measured by our proposed Citation Field Diversity Index (CFDI), has declined from 0.58 in 1980 to 0.31 in 2022 (an all-time low). In addition, we find that NLP has grown more insular -- citing increasingly more NLP papers and having fewer papers that act as bridges between fields. NLP citations are dominated by computer science; Less than 8% of NLP citations are to linguistics, and less than 3% are to math and psychology. These findings underscore NLP's urgent need to reflect on its engagement with various fields.
Related papers
- The Nature of NLP: Analyzing Contributions in NLP Papers [77.31665252336157]
We quantitatively investigate what constitutes NLP research by examining research papers.
Our findings reveal a rising involvement of machine learning in NLP since the early nineties.
In post-2020, there has been a resurgence of focus on language and people.
arXiv Detail & Related papers (2024-09-29T01:29:28Z) - Shoulders of Giants: A Look at the Degree and Utility of Openness in NLP Research [1.1650821883155187]
We observe that papers published in different NLP venues show different patterns related to artefact reuse.
More than 30% of the papers we analysed do not release their artefacts publicly, despite promising to do so.
We observe a wide language-wise disparity in publicly available NLP-related artefacts.
arXiv Detail & Related papers (2024-06-10T04:47:27Z) - Citation Amnesia: NLP and Other Academic Fields Are in a Citation Age
Recession [32.77640515002326]
This study examines the tendency to cite older work across 20 fields of study over 43 years (1980--2023)
We term this decline a 'citation age recession', analogous to how economists define periods of reduced economic activity.
Our results suggest that citing more recent works is not directly driven by the growth in publication rates.
arXiv Detail & Related papers (2024-02-19T10:59:29Z) - Defining a New NLP Playground [85.41973504055588]
The recent explosion of performance of large language models has changed the field of Natural Language Processing more abruptly and seismically than any other shift in the field's 80-year history.
This paper proposes 20+ PhD-dissertation-worthy research directions, covering theoretical analysis, new and challenging problems, learning paradigms, and interdisciplinary applications.
arXiv Detail & Related papers (2023-10-31T17:02:33Z) - Forgotten Knowledge: Examining the Citational Amnesia in NLP [63.13508571014673]
We show how far back in time do we tend to go to cite papers? How has that changed over time, and what factors correlate with this citational attention/amnesia?
We show that around 62% of cited papers are from the immediate five years prior to publication, whereas only about 17% are more than ten years old.
We show that the median age and age diversity of cited papers were steadily increasing from 1990 to 2014, but since then, the trend has reversed, and current NLP papers have an all-time low temporal citation diversity.
arXiv Detail & Related papers (2023-05-29T18:30:34Z) - Beyond Good Intentions: Reporting the Research Landscape of NLP for
Social Good [115.1507728564964]
We introduce NLP4SG Papers, a scientific dataset with three associated tasks.
These tasks help identify NLP4SG papers and characterize the NLP4SG landscape.
We use state-of-the-art NLP models to address each of these tasks and use them on the entire ACL Anthology.
arXiv Detail & Related papers (2023-05-09T14:16:25Z) - The Elephant in the Room: Analyzing the Presence of Big Tech in Natural Language Processing Research [28.382353702576314]
We use a corpus with comprehensive metadata of 78,187 NLP publications and 701 resumes of NLP publication authors.
We find that industry presence among NLP authors has been steady before a steep increase over the past five years.
A few companies account for most of the publications and provide funding to academic researchers through grants and internships.
arXiv Detail & Related papers (2023-05-04T12:57:18Z) - Geographic Citation Gaps in NLP Research [63.13508571014673]
This work asks a series of questions on the relationship between geographical location and publication success.
We first created a dataset of 70,000 papers from the ACL Anthology, extracted their meta-information, and generated their citation network.
We show that not only are there substantial geographical disparities in paper acceptance and citation but also that these disparities persist even when controlling for a number of variables such as venue of publication and sub-field of NLP.
arXiv Detail & Related papers (2022-10-26T02:25:23Z) - Examining Citations of Natural Language Processing Literature [31.87319293259599]
We show that only about 56% of the papers in AA are cited ten or more times.
CL Journal has the most cited papers, but its citation dominance has lessened in recent years.
papers on sentiment classification, anaphora resolution, and entity recognition have the highest median citations.
arXiv Detail & Related papers (2020-05-02T20:01:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.