Structural shifts in institutional participation and collaboration within the AI arXiv preprint research ecosystem
- URL: http://arxiv.org/abs/2602.03969v2
- Date: Mon, 09 Feb 2026 00:03:44 GMT
- Title: Structural shifts in institutional participation and collaboration within the AI arXiv preprint research ecosystem
- Authors: Shama Maganur, Mayank Kejriwal,
- Abstract summary: This paper examines structural changes in the AI research landscape using a dataset of arXiv preprints (cs.AI) from 2021 through 2025.<n>Our results reveal an unprecedented surge in publication output following the introduction of ChatGPT.<n>However, academic--industry collaboration is still suppressed, as measured by a Normalized Collaboration Index (NCI) that remains significantly below the random-mixing baseline.
- Score: 2.5782420501870296
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The emergence of large language models (LLMs) represents a significant technological shift within the scientific ecosystem, particularly within the field of artificial intelligence (AI). This paper examines structural changes in the AI research landscape using a dataset of arXiv preprints (cs.AI) from 2021 through 2025. Given the rapid pace of AI development, the preprint ecosystem has become a critical barometer for real-time scientific shifts, often preceding formal peer-reviewed publication by months or years. By employing a multi-stage data collection and enrichment pipeline in conjunction with LLM-based institution classification, we analyze the evolution of publication volumes, author team sizes, and academic--industry collaboration patterns. Our results reveal an unprecedented surge in publication output following the introduction of ChatGPT, with academic institutions continuing to provide the largest volume of research. However, we observe that academic--industry collaboration is still suppressed, as measured by a Normalized Collaboration Index (NCI) that remains significantly below the random-mixing baseline across all major subfields. These findings highlight a continuing institutional divide and suggest that the capital-intensive nature of generative AI research may be reshaping the boundaries of scientific collaboration.
Related papers
- A Data-Driven Analysis for Engineering Conferences: The Institute of Industrial and Systems Engineering (IISE) Annual Conference Proceedings (2002-2025) [0.0]
This paper presents a computational analysis of IISE proceedings from 2002 to 2025.<n>We map thematic evolution to identify dominant, emerging, and receding research topics.<n>The findings illuminate the field's intellectual assets and provide a data-informed map to guide the future of ISE.
arXiv Detail & Related papers (2026-02-28T01:10:46Z) - PreScience: A Benchmark for Forecasting Scientific Contributions [32.63164451901248]
PreScience is a scientific forecasting benchmark that decomposes the research process into four interdependent generative tasks.<n>We develop baselines and evaluations for each task, including LACERScore, a novel measure of contribution similarity.<n>The resulting synthetic corpus is systematically less diverse and less novel than human-authored research from the same period.
arXiv Detail & Related papers (2026-02-24T01:37:53Z) - Towards a Science of Collective AI: LLM-based Multi-Agent Systems Need a Transition from Blind Trial-and-Error to Rigorous Science [70.3658845234978]
Large Language Models (LLMs) have greatly extended the capabilities of Multi-Agent Systems (MAS)<n>Despite this rapid progress, the field still relies heavily on empirical trial-and-error.<n>This bottleneck stems from the ambiguity of attribution.<n>We propose a factor attribution paradigm to systematically identify collaboration-driving factors.
arXiv Detail & Related papers (2026-02-05T04:19:52Z) - Does GenAI Rewrite How We Write? An Empirical Study on Two-Million Preprints [15.070885964897734]
Generative large language models (LLMs) introduce a further potential disruption by altering how manuscripts are written.<n>This paper addresses the gap through a large-scale analysis of more than 2.1 million preprints spanning 2016--2025 (115 months) across four major repositories.<n>Our findings reveal that LLMs have accelerated submission and revision cycles, modestly increased linguistic complexity, and disproportionately expanded AI-related topics.
arXiv Detail & Related papers (2025-10-18T01:37:40Z) - The Role of Computing Resources in Publishing Foundation Model Research [84.20094600030092]
We evaluate the relationship between these resources and the scientific advancement of foundation models (FM)<n>We reviewed 6517 FM papers published between 2022 to 2024, and surveyed 229 first-authors to the impact of computing resources on scientific output.<n>We find that increased computing is correlated with national funding allocations and citations, but our findings don't observe the strong correlations with research environment.
arXiv Detail & Related papers (2025-10-15T14:50:45Z) - A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers [251.23085679210206]
Scientific Large Language Models (Sci-LLMs) are transforming how knowledge is represented, integrated, and applied in scientific research.<n>This survey reframes the development of Sci-LLMs as a co-evolution between models and their underlying data substrate.<n>We formulate a unified taxonomy of scientific data and a hierarchical model of scientific knowledge.
arXiv Detail & Related papers (2025-08-28T18:30:52Z) - ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is an AI-based system for ideation and operationalization of novel work.<n>ResearchAgent automatically defines novel problems, proposes methods and designs experiments, while iteratively refining them.<n>We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z) - Mapping the Increasing Use of LLMs in Scientific Papers [99.67983375899719]
We conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals.
Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers.
arXiv Detail & Related papers (2024-04-01T17:45:15Z) - Analyzing the Impact of Companies on AI Research Based on Publications [1.450405446885067]
We compare academic- and company-authored AI publications published in the last decade.
We find that the citation count an individual publication receives is significantly higher when it is (co-authored) by a company.
arXiv Detail & Related papers (2023-10-31T13:27:04Z) - A Comprehensive Study of Groundbreaking Machine Learning Research:
Analyzing highly cited and impactful publications across six decades [1.6442870218029522]
Machine learning (ML) has emerged as a prominent field of research in computer science and other related fields.
It is crucial to understand the landscape of highly cited publications to identify key trends, influential authors, and significant contributions made thus far.
arXiv Detail & Related papers (2023-08-01T21:43:22Z) - Characterising Research Areas in the field of AI [68.8204255655161]
We identified the main conceptual themes by performing clustering analysis on the co-occurrence network of topics.
The results highlight the growing academic interest in research themes like deep learning, machine learning, and internet of things.
arXiv Detail & Related papers (2022-05-26T16:30:30Z) - Studying the characteristics of scientific communities using
individual-level bibliometrics: the case of Big Data research [2.208242292882514]
We study the academic age, production, and research focus of the community of authors active in Big Data research.
Results show that the academic realm of "Big Data" is a growing topic with an expanding community of authors.
arXiv Detail & Related papers (2021-06-10T08:17:09Z) - Scientometric engineering: Exploring citation dynamics via arXiv eprints [0.0]
We investigate the citation data of more than 1.5 million eprints on arXiv.
We find that the typical growth and obsolescence patterns vary across disciplines.
We derive a model consistent with the observed quantitative and temporal characteristics of citation growth and obsolescence.
arXiv Detail & Related papers (2021-06-09T12:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.