Related papers: More code, less validation: Risk factors for over-reliance on AI coding tools among scientists

More code, less validation: Risk factors for over-reliance on AI coding tools among scientists

URL: http://arxiv.org/abs/2512.19644v1
Date: Mon, 22 Dec 2025 18:17:54 GMT
Title: More code, less validation: Risk factors for over-reliance on AI coding tools among scientists
Authors: Gabrielle O'Brien, Alexis Parker, Nasir Eisty, Jeffrey Carver,
Abstract summary: Generative AI tools capable of code generation may support scientific programmers, but user studies indicate risks of over-reliance.<n>We surveyed 868 scientists who program, examining adoption patterns, tool preferences, and factors associated with perceived productivity.
Score: 3.5398689122254763
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Programming is essential to modern scientific research, yet most scientists report inadequate training for the software development their work demands. Generative AI tools capable of code generation may support scientific programmers, but user studies indicate risks of over-reliance, particularly among inexperienced users. We surveyed 868 scientists who program, examining adoption patterns, tool preferences, and factors associated with perceived productivity. Adoption is highest among students and less experienced programmers, with variation across fields. Scientific programmers overwhelmingly prefer general-purpose conversational interfaces like ChatGPT over developer-specific tools. Both inexperience and limited use of development practices (like testing, code review, and version control) are associated with greater perceived productivity-but these factors interact, suggesting formal practices may partially compensate for inexperience. The strongest predictor of perceived productivity is the number of lines of generated code typically accepted at once. These findings suggest scientific programmers using generative AI may gauge productivity by code generation rather than validation, raising concerns about research code integrity.

Related papers

Prompting in Practice: Investigating Software Developers' Use of Generative AI Tools [17.926187565860232]
The integration of generative artificial intelligence (GenAI) tools has fundamentally transformed software development.<n>This study presents a systematic investigation of how software engineers integrate GenAI tools into their professional practice.<n>We surveyed 91 software engineers, including 72 active GenAI users, to understand AI usage patterns throughout the development process.
arXiv Detail & Related papers (2025-10-07T15:02:22Z)
Code with Me or for Me? How Increasing AI Automation Transforms Developer Workflows [60.04362496037186]
We present the first controlled study of developer interactions with coding agents.<n>We evaluate two leading copilot and agentic coding assistants.<n>Our results show agents can assist developers in ways that surpass copilots.
arXiv Detail & Related papers (2025-07-10T20:12:54Z)
From Reproduction to Replication: Evaluating Research Agents with Progressive Code Masking [48.90371827091671]
AutoExperiment is a benchmark that evaluates AI agents' ability to implement and run machine learning experiments.<n>We evaluate state-of-the-art agents and find that performance degrades rapidly as $n$ increases.<n>Our findings highlight critical challenges in long-horizon code generation, context retrieval, and autonomous experiment execution.
arXiv Detail & Related papers (2025-06-24T15:39:20Z)
How Scientists Use Large Language Models to Program [0.0]
We investigate the characteristics of scientists who are early-adopters of code generating models.<n>We see that scientists often use code generating models as an information retrieval tool for navigating unfamiliar programming languages and libraries.
arXiv Detail & Related papers (2025-02-24T17:23:12Z)
Exploring Code Comprehension in Scientific Programming: Preliminary Insights from Research Scientists [6.2329239454115415]
This study surveys 57 research scientists from various disciplines to explore their programming backgrounds, practices, and challenges they face regarding code readability.<n>Scientists mainly use Python and R, relying on documentation for readability.<n>Our findings show low adoption of code quality tools and a trend towards utilizing large language models to improve code quality.
arXiv Detail & Related papers (2025-01-17T08:47:29Z)
Dear Diary: A randomized controlled trial of Generative AI coding tools in the workplace [2.5280615594444567]
Generative AI coding tools are relatively new, and their impact on developers extends beyond traditional coding metrics. This study aims to illuminate developers' preexisting beliefs about generative AI tools, their self perceptions, and how regular use of these tools may alter these beliefs. Our findings reveal that the introduction and sustained use of generative AI coding tools significantly increases developers' perceptions of these tools as both useful and enjoyable.
arXiv Detail & Related papers (2024-10-24T00:07:27Z)
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z)
CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks. We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning. In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z)
Comparing Software Developers with ChatGPT: An Empirical Investigation [0.0]
This paper conducts an empirical investigation, contrasting the performance of software engineers and AI systems, like ChatGPT, across different evaluation metrics. The paper posits that a comprehensive comparison of software engineers and AI-based solutions, considering various evaluation criteria, is pivotal in fostering human-machine collaboration.
arXiv Detail & Related papers (2023-05-19T17:25:54Z)
Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code. We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z)
Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it. Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.