Forgotten Knowledge: Examining the Citational Amnesia in NLP
- URL: http://arxiv.org/abs/2305.18554v2
- Date: Mon, 31 Jul 2023 17:38:03 GMT
- Title: Forgotten Knowledge: Examining the Citational Amnesia in NLP
- Authors: Janvijay Singh, Mukund Rungta, Diyi Yang, Saif M. Mohammad
- Abstract summary: We show how far back in time do we tend to go to cite papers? How has that changed over time, and what factors correlate with this citational attention/amnesia?
We show that around 62% of cited papers are from the immediate five years prior to publication, whereas only about 17% are more than ten years old.
We show that the median age and age diversity of cited papers were steadily increasing from 1990 to 2014, but since then, the trend has reversed, and current NLP papers have an all-time low temporal citation diversity.
- Score: 63.13508571014673
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Citing papers is the primary method through which modern scientific writing
discusses and builds on past work. Collectively, citing a diverse set of papers
(in time and area of study) is an indicator of how widely the community is
reading. Yet, there is little work looking at broad temporal patterns of
citation. This work systematically and empirically examines: How far back in
time do we tend to go to cite papers? How has that changed over time, and what
factors correlate with this citational attention/amnesia? We chose NLP as our
domain of interest and analyzed approximately 71.5K papers to show and quantify
several key trends in citation. Notably, around 62% of cited papers are from
the immediate five years prior to publication, whereas only about 17% are more
than ten years old. Furthermore, we show that the median age and age diversity
of cited papers were steadily increasing from 1990 to 2014, but since then, the
trend has reversed, and current NLP papers have an all-time low temporal
citation diversity. Finally, we show that unlike the 1990s, the highly cited
papers in the last decade were also papers with the least citation diversity,
likely contributing to the intense (and arguably harmful) recency focus. Code,
data, and a demo are available on the project homepage.
Related papers
- Mapping the Increasing Use of LLMs in Scientific Papers [99.67983375899719]
We conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals.
Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers.
arXiv Detail & Related papers (2024-04-01T17:45:15Z) - Position: AI/ML Influencers Have a Place in the Academic Process [82.2069685579588]
We investigate the role of social media influencers in enhancing the visibility of machine learning research.
We have compiled a comprehensive dataset of over 8,000 papers, spanning tweets from December 2018 to October 2023.
Our statistical and causal inference analysis reveals a significant increase in citations for papers endorsed by these influencers.
arXiv Detail & Related papers (2024-01-24T20:05:49Z) - Is there really a Citation Age Bias in NLP? [25.867690917154885]
There is a citation age bias in the Natural Language Processing (NLP) community.
All AI subfields have similar trends of citation amnesia.
Rather than diagnosing this as a citation age bias in the NLP community, we believe this pattern is an artefact of the dynamics of these research fields.
arXiv Detail & Related papers (2024-01-07T17:12:08Z) - CausalCite: A Causal Formulation of Paper Citations [80.82622421055734]
CausalCite is a new way to measure the significance of a paper by assessing the causal impact of the paper on its follow-up papers.
It is based on a novel causal inference method, TextMatch, which adapts the traditional matching framework to high-dimensional text embeddings.
We demonstrate the effectiveness of CausalCite on various criteria, such as high correlation with paper impact as reported by scientific experts.
arXiv Detail & Related papers (2023-11-05T23:09:39Z) - NLLG Quarterly arXiv Report 06/23: What are the most influential current
AI Papers? [15.830129136642755]
The objective is to offer a quick guide to the most relevant and widely discussed research, aiding both newcomers and established researchers in staying abreast of current trends.
We observe the dominance of papers related to Large Language Models (LLMs) and specifically ChatGPT during the first half of 2023.
NLP related papers are the most influential (around 60% of top papers) even though there are twice as many ML related papers in our data.
arXiv Detail & Related papers (2023-07-31T11:53:52Z) - Geographic Citation Gaps in NLP Research [63.13508571014673]
This work asks a series of questions on the relationship between geographical location and publication success.
We first created a dataset of 70,000 papers from the ACL Anthology, extracted their meta-information, and generated their citation network.
We show that not only are there substantial geographical disparities in paper acceptance and citation but also that these disparities persist even when controlling for a number of variables such as venue of publication and sub-field of NLP.
arXiv Detail & Related papers (2022-10-26T02:25:23Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - Utilizing Citation Network Structure to Predict Citation Counts: A Deep
Learning Approach [0.0]
This paper proposes an end-to-end deep learning network, DeepCCP, which combines the effect of information cascade and looks at the citation counts prediction problem.
According to experiments on 6 real data sets, DeepCCP is superior to the state-of-the-art methods in terms of the accuracy of citation count prediction.
arXiv Detail & Related papers (2020-09-06T05:27:50Z) - Examining Citations of Natural Language Processing Literature [31.87319293259599]
We show that only about 56% of the papers in AA are cited ten or more times.
CL Journal has the most cited papers, but its citation dominance has lessened in recent years.
papers on sentiment classification, anaphora resolution, and entity recognition have the highest median citations.
arXiv Detail & Related papers (2020-05-02T20:01:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.