Causal evidence of racial and institutional biases in accessing paywalled articles and scientific data
- URL: http://arxiv.org/abs/2509.08299v1
- Date: Wed, 10 Sep 2025 05:39:08 GMT
- Title: Causal evidence of racial and institutional biases in accessing paywalled articles and scientific data
- Authors: Hazem Ibrahim, Fengyuan Liu, Khalid Mengal, Aaron R. Kaufman, Yasir Zaki, Talal Rahwan,
- Abstract summary: We show that researchers in the Global South cite paywalled papers and upon-request datasets at significantly lower rates than their Global North counterparts.<n>We find that racial identity more strongly predicts response rate to paywalled article requests compared to institutional affiliation.<n>These findings reveal how informal gatekeeping can perpetuate structural inequities in science.
- Score: 3.778678327105226
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scientific progress fundamentally depends on researchers' ability to access and build upon the work of others. Yet, a majority of published work remains behind expensive paywalls, limiting access to universities that can afford subscriptions. Furthermore, even when articles are accessible, the underlying datasets could be restricted, available only through a "reasonable request" to the authors. One way researchers could overcome these barriers is by relying on informal channels, such as emailing authors directly, to obtain paywalled articles or restricted datasets. However, whether these informal channels are hindered by racial and/or institutional biases remains unknown. Here, we combine qualitative semi-structured interviews, large-scale observational analysis, and two randomized audit experiments to examine racial and institutional disparities in access to scientific knowledge. Our analysis of 250 million articles reveals that researchers in the Global South cite paywalled papers and upon-request datasets at significantly lower rates than their Global North counterparts, and that these access gaps are associated with reduced knowledge breadth and scholarly impact. To interrogate the mechanisms underlying this phenomenon, we conduct two randomized email audit studies in which fictional PhD students differing in racial background and institutional affiliation request access to paywalled articles (N = 18,000) and datasets (N = 11,840). We find that racial identity more strongly predicts response rate to paywalled article requests compared to institutional affiliation, whereas institutional affiliation played a larger role in shaping access to datasets. These findings reveal how informal gatekeeping can perpetuate structural inequities in science, highlighting the need for stronger data-sharing mandates and more equitable open access policies.
Related papers
- Research Integrity and Academic Authority in the Age of Artificial Intelligence: From Discovery to Curation? [0.0]
Artificial intelligence is reshaping the organization and practice of research.<n>This article argues that these developments challenge research integrity and erode traditional bases of academic authority.<n>Rather than competing with corporate laboratories at the technological frontier, universities can sustain their legitimacy by strengthening roles that cannot be readily automated or commercialized.
arXiv Detail & Related papers (2026-01-09T06:47:01Z) - The Role of Computing Resources in Publishing Foundation Model Research [84.20094600030092]
We evaluate the relationship between these resources and the scientific advancement of foundation models (FM)<n>We reviewed 6517 FM papers published between 2022 to 2024, and surveyed 229 first-authors to the impact of computing resources on scientific output.<n>We find that increased computing is correlated with national funding allocations and citations, but our findings don't observe the strong correlations with research environment.
arXiv Detail & Related papers (2025-10-15T14:50:45Z) - The Great Data Standoff: Researchers vs. Platforms Under the Digital Services Act [9.275892768167122]
We focus on the 2024 Romanian presidential election interference incident.<n>This is the first event of its kind to trigger systemic risk investigations by the European Commission.<n>By analysing this incident, we can comprehend election-related systemic risk to explore practical research tasks.
arXiv Detail & Related papers (2025-05-02T09:00:19Z) - Beyond authorship: Analyzing contributions in PLOS ONE and the challenges of appropriate attribution [0.0]
The study analyzes 81,823 publications from the journal PLOS ONE.<n> 9.14% of articles feature at least one author with inappropriate authorship, affecting over 14,000 individuals.<n>Inappropriate authorship is more concentrated in Asia, Africa, and specific European countries like Italy.
arXiv Detail & Related papers (2025-04-08T06:47:52Z) - Leveraging Large Language Models to Democratize Access to Costly Datasets for Academic Research [0.0]
We develop and evaluate a novel methodology using GPT-4o-mini within a Retrieval-Augmented Generation (RAG) framework to collect data from corporate disclosures.<n>Our approach achieves human-level accuracy in collecting CEO pay ratios from approximately 10,000 proxy statements and Critical Audit Matters (CAMs) from more than 12,000 10-K filings.<n>This stands in stark contrast to the hundreds of hours needed for manual collection or the thousands of dollars required for commercial database subscriptions.
arXiv Detail & Related papers (2024-12-03T00:59:56Z) - Web Scraping for Research: Legal, Ethical, Institutional, and Scientific Considerations [11.851771490297693]
This paper proposes a comprehensive framework for web scraping in social science research for U.S.-based researchers.<n>We present an overview of the current regulatory environment impacting when and how researchers can access, collect, store, and share data via scraping.<n>We then provide researchers with recommendations to conduct scraping in a scientifically legitimate and ethical manner.
arXiv Detail & Related papers (2024-10-30T20:20:44Z) - A Survey of Privacy-Preserving Model Explanations: Privacy Risks, Attacks, and Countermeasures [50.987594546912725]
Despite a growing corpus of research in AI privacy and explainability, there is little attention on privacy-preserving model explanations.
This article presents the first thorough survey about privacy attacks on model explanations and their countermeasures.
arXiv Detail & Related papers (2024-03-31T12:44:48Z) - Position: AI/ML Influencers Have a Place in the Academic Process [82.2069685579588]
We investigate the role of social media influencers in enhancing the visibility of machine learning research.
We have compiled a comprehensive dataset of over 8,000 papers, spanning tweets from December 2018 to October 2023.
Our statistical and causal inference analysis reveals a significant increase in citations for papers endorsed by these influencers.
arXiv Detail & Related papers (2024-01-24T20:05:49Z) - Having your Privacy Cake and Eating it Too: Platform-supported Auditing
of Social Media Algorithms for Public Interest [70.02478301291264]
Social media platforms curate access to information and opportunities, and so play a critical role in shaping public discourse.
Prior studies have used black-box methods to show that these algorithms can lead to biased or discriminatory outcomes.
We propose a new method for platform-supported auditing that can meet the goals of the proposed legislation.
arXiv Detail & Related papers (2022-07-18T17:32:35Z) - Algorithmic Fairness Datasets: the Story so Far [68.45921483094705]
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being.
A growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations.
Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented.
Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity)
arXiv Detail & Related papers (2022-02-03T17:25:46Z) - Yes-Yes-Yes: Donation-based Peer Reviewing Data Collection for ACL
Rolling Review and Beyond [58.71736531356398]
We present an in-depth discussion of peer reviewing data, outline the ethical and legal desiderata for peer reviewing data collection, and propose the first continuous, donation-based data collection workflow.
We report on the ongoing implementation of this workflow at the ACL Rolling Review and deliver the first insights obtained with the newly collected data.
arXiv Detail & Related papers (2022-01-27T11:02:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.