Does GenAI Rewrite How We Write? An Empirical Study on Two-Million Preprints
- URL: http://arxiv.org/abs/2510.17882v1
- Date: Sat, 18 Oct 2025 01:37:40 GMT
- Title: Does GenAI Rewrite How We Write? An Empirical Study on Two-Million Preprints
- Authors: Minfeng Qi, Zhongmin Cao, Qin Wang, Ningran Li, Tianqing Zhu,
- Abstract summary: Generative large language models (LLMs) introduce a further potential disruption by altering how manuscripts are written.<n>This paper addresses the gap through a large-scale analysis of more than 2.1 million preprints spanning 2016--2025 (115 months) across four major repositories.<n>Our findings reveal that LLMs have accelerated submission and revision cycles, modestly increased linguistic complexity, and disproportionately expanded AI-related topics.
- Score: 15.070885964897734
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Preprint repositories become central infrastructures for scholarly communication. Their expansion transforms how research is circulated and evaluated before journal publication. Generative large language models (LLMs) introduce a further potential disruption by altering how manuscripts are written. While speculation abounds, systematic evidence of whether and how LLMs reshape scientific publishing remains limited. This paper addresses the gap through a large-scale analysis of more than 2.1 million preprints spanning 2016--2025 (115 months) across four major repositories (i.e., arXiv, bioRxiv, medRxiv, SocArXiv). We introduce a multi-level analytical framework that integrates interrupted time-series models, collaboration and productivity metrics, linguistic profiling, and topic modeling to assess changes in volume, authorship, style, and disciplinary orientation. Our findings reveal that LLMs have accelerated submission and revision cycles, modestly increased linguistic complexity, and disproportionately expanded AI-related topics, while computationally intensive fields benefit more than others. These results show that LLMs act less as universal disruptors than as selective catalysts, amplifying existing strengths and widening disciplinary divides. By documenting these dynamics, the paper provides the first empirical foundation for evaluating the influence of generative AI on academic publishing and highlights the need for governance frameworks that preserve trust, fairness, and accountability in an AI-enabled research ecosystem.
Related papers
- Towards a Science of Collective AI: LLM-based Multi-Agent Systems Need a Transition from Blind Trial-and-Error to Rigorous Science [70.3658845234978]
Large Language Models (LLMs) have greatly extended the capabilities of Multi-Agent Systems (MAS)<n>Despite this rapid progress, the field still relies heavily on empirical trial-and-error.<n>This bottleneck stems from the ambiguity of attribution.<n>We propose a factor attribution paradigm to systematically identify collaboration-driving factors.
arXiv Detail & Related papers (2026-02-05T04:19:52Z) - Structural shifts in institutional participation and collaboration within the AI arXiv preprint research ecosystem [2.5782420501870296]
This paper examines structural changes in the AI research landscape using a dataset of arXiv preprints (cs.AI) from 2021 through 2025.<n>Our results reveal an unprecedented surge in publication output following the introduction of ChatGPT.<n>However, academic--industry collaboration is still suppressed, as measured by a Normalized Collaboration Index (NCI) that remains significantly below the random-mixing baseline.
arXiv Detail & Related papers (2026-02-03T19:35:16Z) - LLAMA LIMA: A Living Meta-Analysis on the Effects of Generative AI on Learning Mathematics [0.0]
We present a Living Meta-Analysis (LIMA) on the effects of generative AI-based interventions for learning mathematics.<n>We continuously update the literature base, apply a Bayesian multilevel meta-regression model to account for cumulative data, and publish updated versions on a preprint server at regular intervals.
arXiv Detail & Related papers (2026-01-26T17:00:52Z) - Let's Use ChatGPT To Write Our Paper! Benchmarking LLMs To Write the Introduction of a Research Paper [64.50822834679101]
SciIG is a task that evaluates LLMs' ability to produce coherent introductions from titles, abstracts, and related works.<n>We assess five state-of-the-art models, including open-source (DeepSeek-v3, Gemma-3-12B, LLaMA 4-Maverick, MistralAI Small 3.1) and closed-source GPT-4o systems.<n>Results demonstrate LLaMA-4 Maverick's superior performance on most metrics, particularly in semantic similarity and faithfulness.
arXiv Detail & Related papers (2025-08-19T21:11:11Z) - Computational Approaches to Understanding Large Language Model Impact on Writing and Information Ecosystems [10.503784446147122]
Large language models (LLMs) have shown significant potential to change how we write, communicate, and create.<n>This dissertation examines how individuals and institutions are adapting to and engaging with this emerging technology.
arXiv Detail & Related papers (2025-06-20T20:15:09Z) - XtraGPT: Context-Aware and Controllable Academic Paper Revision [43.263488839387584]
We propose a human-AI collaboration framework for academic paper revision centered on criteria-guided intent alignment and context-aware modeling.<n>We instantiate the framework in XtraGPT, the first suite of open-source LLMs for context-aware, instruction-guided writing assistance.
arXiv Detail & Related papers (2025-05-16T15:02:19Z) - Divergent LLM Adoption and Heterogeneous Convergence Paths in Research Writing [0.8046044493355781]
Large Language Models (LLMs) are reshaping content creation and academic writing.<n>This study investigates the impact of AI-assisted generative revisions on research manuscripts.
arXiv Detail & Related papers (2025-04-18T11:09:16Z) - A Survey on Post-training of Large Language Models [185.51013463503946]
Large Language Models (LLMs) have fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration.<n>These challenges necessitate advanced post-training language models (PoLMs) to address shortcomings, such as restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance.<n>This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms: Fine-tuning, which enhances task-specific accuracy; Alignment, which ensures ethical coherence and alignment with human preferences; Reasoning, which advances multi-step inference despite challenges in reward design; Integration and Adaptation, which
arXiv Detail & Related papers (2025-03-08T05:41:42Z) - ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is an AI-based system for ideation and operationalization of novel work.<n>ResearchAgent automatically defines novel problems, proposes methods and designs experiments, while iteratively refining them.<n>We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z) - Mapping the Increasing Use of LLMs in Scientific Papers [99.67983375899719]
We conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals.
Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers.
arXiv Detail & Related papers (2024-04-01T17:45:15Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.