Geographic Citation Gaps in NLP Research
- URL: http://arxiv.org/abs/2210.14424v1
- Date: Wed, 26 Oct 2022 02:25:23 GMT
- Title: Geographic Citation Gaps in NLP Research
- Authors: Mukund Rungta, Janvijay Singh, Saif M. Mohammad and Diyi Yang
- Abstract summary: This work asks a series of questions on the relationship between geographical location and publication success.
We first created a dataset of 70,000 papers from the ACL Anthology, extracted their meta-information, and generated their citation network.
We show that not only are there substantial geographical disparities in paper acceptance and citation but also that these disparities persist even when controlling for a number of variables such as venue of publication and sub-field of NLP.
- Score: 63.13508571014673
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In a fair world, people have equitable opportunities to education, to conduct
scientific research, to publish, and to get credit for their work, regardless
of where they live. However, it is common knowledge among researchers that a
vast number of papers accepted at top NLP venues come from a handful of western
countries and (lately) China; whereas, very few papers from Africa and South
America get published. Similar disparities are also believed to exist for paper
citation counts. In the spirit of "what we do not measure, we cannot improve",
this work asks a series of questions on the relationship between geographical
location and publication success (acceptance in top NLP venues and citation
impact). We first created a dataset of 70,000 papers from the ACL Anthology,
extracted their meta-information, and generated their citation network. We then
show that not only are there substantial geographical disparities in paper
acceptance and citation but also that these disparities persist even when
controlling for a number of variables such as venue of publication and
sub-field of NLP. Further, despite some steps taken by the NLP community to
improve geographical diversity, we show that the disparity in publication
metrics across locations is still on an increasing trend since the early 2000s.
We release our code and dataset here:
https://github.com/iamjanvijay/acl-cite-net
Related papers
- Mapping the Increasing Use of LLMs in Scientific Papers [99.67983375899719]
We conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals.
Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers.
arXiv Detail & Related papers (2024-04-01T17:45:15Z) - Position: AI/ML Influencers Have a Place in the Academic Process [82.2069685579588]
We investigate the role of social media influencers in enhancing the visibility of machine learning research.
We have compiled a comprehensive dataset of over 8,000 papers, spanning tweets from December 2018 to October 2023.
Our statistical and causal inference analysis reveals a significant increase in citations for papers endorsed by these influencers.
arXiv Detail & Related papers (2024-01-24T20:05:49Z) - Is there really a Citation Age Bias in NLP? [25.867690917154885]
There is a citation age bias in the Natural Language Processing (NLP) community.
All AI subfields have similar trends of citation amnesia.
Rather than diagnosing this as a citation age bias in the NLP community, we believe this pattern is an artefact of the dynamics of these research fields.
arXiv Detail & Related papers (2024-01-07T17:12:08Z) - We are Who We Cite: Bridges of Influence Between Natural Language Processing and Other Academic Fields [30.550895983110806]
Cross-field engagement of Natural Language Processing has declined.
Less than 8% of NLP citations are to linguistics.
Less than 3% of NLP citations are to math and psychology.
arXiv Detail & Related papers (2023-10-23T12:42:06Z) - Forgotten Knowledge: Examining the Citational Amnesia in NLP [63.13508571014673]
We show how far back in time do we tend to go to cite papers? How has that changed over time, and what factors correlate with this citational attention/amnesia?
We show that around 62% of cited papers are from the immediate five years prior to publication, whereas only about 17% are more than ten years old.
We show that the median age and age diversity of cited papers were steadily increasing from 1990 to 2014, but since then, the trend has reversed, and current NLP papers have an all-time low temporal citation diversity.
arXiv Detail & Related papers (2023-05-29T18:30:34Z) - Square One Bias in NLP: Towards a Multi-Dimensional Exploration of the
Research Manifold [88.83876819883653]
We show through a manual classification of recent NLP research papers that this is indeed the case.
We observe that NLP research often goes beyond the square one setup, focusing not only on accuracy, but also on fairness or interpretability, but typically only along a single dimension.
arXiv Detail & Related papers (2022-06-20T13:04:23Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - Examining Citations of Natural Language Processing Literature [31.87319293259599]
We show that only about 56% of the papers in AA are cited ten or more times.
CL Journal has the most cited papers, but its citation dominance has lessened in recent years.
papers on sentiment classification, anaphora resolution, and entity recognition have the highest median citations.
arXiv Detail & Related papers (2020-05-02T20:01:59Z) - The Demise of Single-Authored Publications in Computer Science: A
Citation Network Analysis [0.0]
I analyze the DBLP database to study role of single author publications in the computer science literature between 1940 and 2019.
I examine the demographics and reception by computing the population fraction, citation statistics, and scores of single author publications over the years.
arXiv Detail & Related papers (2020-01-02T07:47:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.