Scientific production in the era of Large Language Models
- URL: http://arxiv.org/abs/2601.13187v1
- Date: Mon, 19 Jan 2026 16:10:22 GMT
- Title: Scientific production in the era of Large Language Models
- Authors: Keigo Kusumegi, Xinyu Yang, Paul Ginsparg, Mathijs de Vaan, Toby Stuart, Yian Yin,
- Abstract summary: Large Language Models (LLMs) are rapidly reshaping scientific research.<n>We analyze these changes in multiple, large-scale datasets with 2.1M preprints, 28K peer review reports, and 246M online accesses to scientific documents.
- Score: 8.9515784506288
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) are rapidly reshaping scientific research. We analyze these changes in multiple, large-scale datasets with 2.1M preprints, 28K peer review reports, and 246M online accesses to scientific documents. We find: 1) scientists adopting LLMs to draft manuscripts demonstrate a large increase in paper production, ranging from 23.7-89.3% depending on scientific field and author background, 2) LLM use has reversed the relationship between writing complexity and paper quality, leading to an influx of manuscripts that are linguistically complex but substantively underwhelming, and 3) LLM adopters access and cite more diverse prior work, including books and younger, less-cited documents. These findings highlight a stunning shift in scientific production that will likely require a change in how journals, funding agencies, and tenure committees evaluate scientific works.
Related papers
- Does GenAI Rewrite How We Write? An Empirical Study on Two-Million Preprints [15.070885964897734]
Generative large language models (LLMs) introduce a further potential disruption by altering how manuscripts are written.<n>This paper addresses the gap through a large-scale analysis of more than 2.1 million preprints spanning 2016--2025 (115 months) across four major repositories.<n>Our findings reveal that LLMs have accelerated submission and revision cycles, modestly increased linguistic complexity, and disproportionately expanded AI-related topics.
arXiv Detail & Related papers (2025-10-18T01:37:40Z) - LLM-REVal: Can We Trust LLM Reviewers Yet? [70.58742663985652]
Large language models (LLMs) have inspired researchers to integrate them extensively into the academic workflow.<n>This study focuses on how the deep integration of LLMs into both peer-review and research processes may influence scholarly fairness.
arXiv Detail & Related papers (2025-10-14T10:30:20Z) - Let's Use ChatGPT To Write Our Paper! Benchmarking LLMs To Write the Introduction of a Research Paper [64.50822834679101]
SciIG is a task that evaluates LLMs' ability to produce coherent introductions from titles, abstracts, and related works.<n>We assess five state-of-the-art models, including open-source (DeepSeek-v3, Gemma-3-12B, LLaMA 4-Maverick, MistralAI Small 3.1) and closed-source GPT-4o systems.<n>Results demonstrate LLaMA-4 Maverick's superior performance on most metrics, particularly in semantic similarity and faithfulness.
arXiv Detail & Related papers (2025-08-19T21:11:11Z) - How Deep Do Large Language Models Internalize Scientific Literature and Citation Practices? [1.130790932059036]
We show that large language models (LLMs) reinforce the Matthew effect in citations by consistently favoring highly cited papers.<n>We analyze 274,951 references generated by GPT-4o for 10,000 papers.
arXiv Detail & Related papers (2025-04-03T17:04:56Z) - On the Effectiveness of Large Language Models in Automating Categorization of Scientific Texts [5.831737970661138]
We evaluate Large Language Models (LLMs) in their ability to sort scientific publications into hierarchical classifications systems.<n>Using the FORC dataset as ground truth data, we have found that recent LLMs are able to reach an accuracy of up to 0.82, which is up to 0.08 better than traditional BERT models.
arXiv Detail & Related papers (2025-02-08T20:37:21Z) - Towards Efficient Large Language Models for Scientific Text: A Review [4.376712802685017]
Large language models (LLMs) have ushered in a new era for processing complex information in various fields, including science.
Due to the power of LLMs, they require extremely expensive computational resources, intense amounts of data, and training time.
In recent years, researchers have proposed various methodologies to make scientific LLMs more affordable.
arXiv Detail & Related papers (2024-08-20T10:57:34Z) - Inclusivity in Large Language Models: Personality Traits and Gender Bias in Scientific Abstracts [49.97673761305336]
We evaluate three large language models (LLMs) for their alignment with human narrative styles and potential gender biases.
Our findings indicate that, while these models generally produce text closely resembling human authored content, variations in stylistic features suggest significant gender biases.
arXiv Detail & Related papers (2024-06-27T19:26:11Z) - A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery [68.48094108571432]
Large language models (LLMs) have revolutionized the way text and other modalities of data are handled.
We aim to provide a more holistic view of the research landscape by unveiling cross-field and cross-modal connections between scientific LLMs.
arXiv Detail & Related papers (2024-06-16T08:03:24Z) - ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is an AI-based system for ideation and operationalization of novel work.<n>ResearchAgent automatically defines novel problems, proposes methods and designs experiments, while iteratively refining them.<n>We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z) - Mapping the Increasing Use of LLMs in Scientific Papers [99.67983375899719]
We conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals.
Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers.
arXiv Detail & Related papers (2024-04-01T17:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.