Code Contribution and Credit in Science
- URL: http://arxiv.org/abs/2510.16242v1
- Date: Fri, 17 Oct 2025 22:17:38 GMT
- Title: Code Contribution and Credit in Science
- Authors: Eva Maxfield Brown, Isaac Slaughter, Nicholas Weber,
- Abstract summary: We investigate how software development activities influence credit allocation in collaborative scientific settings.<n>Nearly 30% of articles include non-author code contributors.<n>Authors who contribute code more frequently exhibit progressively lower h-indices than non-coding colleagues.
- Score: 1.5484595752241122
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Software development has become essential to scientific research, but its relationship to traditional metrics of scholarly credit remains poorly understood. We develop a dataset of approximately 140,000 paired research articles and code repositories, as well as a predictive model that matches research article authors with software repository developer accounts. We use this data to investigate how software development activities influence credit allocation in collaborative scientific settings. Our findings reveal significant patterns distinguishing software contributions from traditional authorship credit. We find that nearly 30% of articles include non-author code contributors- individuals who participated in software development but received no formal authorship recognition. While code-contributing authors show a modest $\sim$4.2% increase in article citations, this effect becomes non-significant when controlling for domain, article type, and open access status. First authors are significantly more likely to be code contributors than other author positions. Notably, we identify a negative relationship between coding frequency and scholarly impact metrics. Authors who contribute code more frequently exhibit progressively lower h-indices than non-coding colleagues, even when controlling for publication count, author position, domain, and article type. These results suggest a disconnect between software contributions and credit, highlighting important implications for institutional reward structures and science policy.
Related papers
- SciCoQA: Quality Assurance for Scientific Paper--Code Alignment [53.70401063640645]
We present SciCoQA, a dataset for detecting discrepancies between scientific publications and theirs.<n>Our dataset consists of 611 paper-code discrepancies (81 real, 530 synthetic), spanning diverse computational science disciplines.<n>The best performing model in our evaluation, GPT-5, can only detect 45.7% of real-world paper-code discrepancies.
arXiv Detail & Related papers (2026-01-19T10:04:33Z) - CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection [60.52240468810558]
We introduce CoCoNUTS, a content-oriented benchmark built upon a fine-grained dataset of AI-generated peer reviews.<n>We also develop CoCoDet, an AI review detector via a multi-task learning framework, to achieve more accurate and robust detection of AI involvement in review content.
arXiv Detail & Related papers (2025-08-28T06:03:11Z) - Is Compression Really Linear with Code Intelligence? [60.123628177110206]
textitFormat Annealing is a lightweight, transparent training methodology designed to assess the intrinsic capabilities of pre-trained models equitably.<n>Our empirical results reveal a fundamental logarithmic relationship between measured code intelligence and bits-per-character (BPC)<n>Our work provides a more nuanced understanding of compression's role in developing code intelligence and contributes a robust evaluation framework in the code domain.
arXiv Detail & Related papers (2025-05-16T16:59:14Z) - Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning [70.04746094652653]
We introduce PaperCoder, a framework that transforms machine learning papers into functional code repositories.<n>PaperCoder operates in three stages: planning, designs the system architecture with diagrams, identifies file dependencies, and generates configuration files.<n>We then evaluate PaperCoder on generating code implementations from machine learning papers based on both model-based and human evaluations.
arXiv Detail & Related papers (2025-04-24T01:57:01Z) - Hidden Division of Labor in Scientific Teams Revealed Through 1.6 Million LaTeX Files [37.77089168249056]
We analyze author-specific macros in files from 1.6 million papers (1991-2023) by 2 million scientists.<n>Using explicit section information, we reveal a hidden division of labor within scientific teams.
arXiv Detail & Related papers (2025-02-11T05:07:36Z) - Exploring Code Comprehension in Scientific Programming: Preliminary Insights from Research Scientists [6.2329239454115415]
This study surveys 57 research scientists from various disciplines to explore their programming backgrounds, practices, and challenges they face regarding code readability.<n>Scientists mainly use Python and R, relying on documentation for readability.<n>Our findings show low adoption of code quality tools and a trend towards utilizing large language models to improve code quality.
arXiv Detail & Related papers (2025-01-17T08:47:29Z) - Code Ownership: The Principles, Differences, and Their Associations with Software Quality [6.123324869194196]
We investigate the differences in the commonly used ownership approximations in terms of the set of developers, the approximated code ownership values, and the expertise level.
We find that commit-based and line-based ownership approximations produce different sets of developers, different code ownership values, and different sets of major developers.
arXiv Detail & Related papers (2024-08-23T03:01:59Z) - Examining Ownership Models in Software Teams: A Systematic Literature Review and a Replication Study [2.0891120283967264]
We identify 79 relevant papers published between 2005 and 2022.
We develop a taxonomy of ownership artifacts based on type, owners, and degree of ownership.
arXiv Detail & Related papers (2024-05-24T16:03:22Z) - Mapping the Increasing Use of LLMs in Scientific Papers [99.67983375899719]
We conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals.
Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers.
arXiv Detail & Related papers (2024-04-01T17:45:15Z) - How do Software Engineering Researchers Use GitHub? An Empirical Study of Artifacts & Impact [0.2209921757303168]
We ask whether and how authors engage in social coding related to their research.
Ten thousand papers in top SE research venues, hand-annotating their GitHub links, and studying 309 paper-related repositories.
We find a wide distribution in popularity and impact, some strongly correlated with publication venue.
arXiv Detail & Related papers (2023-10-02T18:56:33Z) - Collaborative, Code-Proximal Dynamic Software Visualization within Code
Editors [55.57032418885258]
This paper introduces the design and proof-of-concept implementation for a software visualization approach that can be embedded into code editors.
Our contribution differs from related work in that we use dynamic analysis of a software system's runtime behavior.
Our visualization approach enhances common remote pair programming tools and is collaboratively usable by employing shared code cities.
arXiv Detail & Related papers (2023-08-30T06:35:40Z) - Characterizing the effect of retractions on publishing careers [0.7988085110283119]
Retracting academic papers may have far-reaching consequences for retracted authors and their careers.<n>Our findings suggest that retractions may impose a disproportionate impact on early-career authors.
arXiv Detail & Related papers (2023-06-11T15:52:39Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.