Related papers: Code for Machines, Not Just Humans: Quantifying AI-Friendliness with Code Health Metrics

Code for Machines, Not Just Humans: Quantifying AI-Friendliness with Code Health Metrics

URL: http://arxiv.org/abs/2601.02200v1
Date: Mon, 05 Jan 2026 15:23:55 GMT
Title: Code for Machines, Not Just Humans: Quantifying AI-Friendliness with Code Health Metrics
Authors: Markus Borg, Nadim Hagatulah, Adam Tornhill, Emma Söderberg,
Abstract summary: We investigate the concept of AI-friendly code' via a dataset of 5,000 Python files from competitive programming.<n>Our findings confirm that human-friendly code is also more compatible with AI tooling.<n>These results suggest that organizations can use CodeHealth to guide where AI interventions are lower risk and where additional human oversight is warranted.
Score: 6.108440460022983
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We are entering a hybrid era in which human developers and AI coding agents work in the same codebases. While industry practice has long optimized code for human comprehension, it is increasingly important to ensure that LLMs with different capabilities can edit code reliably. In this study, we investigate the concept of ``AI-friendly code'' via LLM-based refactoring on a dataset of 5,000 Python files from competitive programming. We find a meaningful association between CodeHealth, a quality metric calibrated for human comprehension, and semantic preservation after AI refactoring. Our findings confirm that human-friendly code is also more compatible with AI tooling. These results suggest that organizations can use CodeHealth to guide where AI interventions are lower risk and where additional human oversight is warranted. Investing in maintainability not only helps humans; it also prepares for large-scale AI adoption.

Related papers

Why Human Guidance Matters in Collaborative Vibe Coding [24.04414458645034]
We study the impact of vibe coding on productivity and collaboration.<n>We show that people provide uniquely effective high-level instructions for vibe coding.<n>We also demonstrate that hybrid systems perform best when humans retain directional control.
arXiv Detail & Related papers (2026-02-11T03:24:57Z)
AI Code in the Wild: Measuring Security Risks and Ecosystem Shifts of AI-Generated Code in Modern Software [12.708926174194199]
We present the first large-scale empirical study of AI-generated code (AIGCode) in the wild.<n>We build a high-precision detection pipeline and a benchmark to distinguish AIGCode from human-written code.<n>This lets us label commits, files, and functions along a human/AI axis and trace how AIGCode moves through projects and vulnerability life cycles.
arXiv Detail & Related papers (2025-12-21T02:26:29Z)
Human-Written vs. AI-Generated Code: A Large-Scale Study of Defects, Vulnerabilities, and Complexity [4.478789600295493]
We present a large-scale comparison of code authored by human developers and three state-of-the-art LLMs, i.e., ChatGPT, DeepSeek-Coder, and Qwen-Coder.<n>Our evaluation spans over 500k code samples in two widely used languages, Python and Java, classifying defects via Orthogonal Defect Classification and security vulnerabilities using the Common Weaknession.<n>We find that AI-generated code is generally simpler and more prone to unused constructs and hardcoded, while human-written code exhibits greater structural complexity and a higher concentration of maintainability issues.
arXiv Detail & Related papers (2025-08-29T13:51:28Z)
Code with Me or for Me? How Increasing AI Automation Transforms Developer Workflows [60.04362496037186]
We present the first controlled study of developer interactions with coding agents.<n>We evaluate two leading copilot and agentic coding assistants.<n>Our results show agents can assist developers in ways that surpass copilots.
arXiv Detail & Related papers (2025-07-10T20:12:54Z)
Humanity's Last Code Exam: Can Advanced LLMs Conquer Human's Hardest Code Competition? [53.863591321231276]
Humanity's Last Code Exam (HLCE) comprises 235 most challenging problems from the International Collegiate Programming Contest (ICPC World Finals) and the International Olympiad in Informatics (IOI)<n>As part of HLCE, we design a harmonized online-offline sandbox that guarantees fully reproducible evaluation.<n>We observe that even the strongest reasoning LLMs: o4-mini(high) and Gemini-2.5 Pro, achieve pass@1 rates of only 15.9% and 11.4%, respectively.
arXiv Detail & Related papers (2025-06-15T04:03:31Z)
Do LLMs trust AI regulation? Emerging behaviour of game-theoretic LLM agents [61.132523071109354]
This paper investigates the interplay between AI developers, regulators and users, modelling their strategic choices under different regulatory scenarios.<n>Our research identifies emerging behaviours of strategic AI agents, which tend to adopt more "pessimistic" stances than pure game-theoretic agents.
arXiv Detail & Related papers (2025-04-11T15:41:21Z)
Bridging LLM-Generated Code and Requirements: Reverse Generation technique and SBC Metric for Developer Insights [0.0]
This paper introduces a novel scoring mechanism called the SBC score.<n>It is based on a reverse generation technique that leverages the natural language generation capabilities of Large Language Models.<n>Unlike direct code analysis, our approach reconstructs system requirements from AI-generated code and compares them with the original specifications.
arXiv Detail & Related papers (2025-02-11T01:12:11Z)
AIGCodeSet: A New Annotated Dataset for AI Generated Code Detection [0.0]
We present AIGCodeSet, which consists of 2.828 AI-generated and 4.755 human-written Python codes.<n>Our experiments show that a Bayesian classifier outperforms the other models.
arXiv Detail & Related papers (2024-12-21T11:53:49Z)
CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code [56.019447113206006]
Large Language Models (LLMs) have achieved remarkable progress in code generation.<n>CodeIP is a novel multi-bit watermarking technique that inserts additional information to preserve provenance details.<n>Experiments conducted on a real-world dataset across five programming languages demonstrate the effectiveness of CodeIP.
arXiv Detail & Related papers (2024-04-24T04:25:04Z)
Assured LLM-Based Software Engineering [51.003878077888686]
This paper is an outline of the content of the keynote by Mark Harman at the International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering, Monday 15th April 2024, Lisbon, Portugal.
arXiv Detail & Related papers (2024-02-06T20:38:46Z)
When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment [96.77970239683475]
AI systems need to be able to understand, interpret and predict human moral judgments and decisions. A central challenge for AI safety is capturing the flexibility of the human moral mind. We present a novel challenge set consisting of rule-breaking question answering.
arXiv Detail & Related papers (2022-10-04T09:04:27Z)
COSEA: Convolutional Code Search with Layer-wise Attention [90.35777733464354]
We propose a new deep learning architecture, COSEA, which leverages convolutional neural networks with layer-wise attention to capture the code's intrinsic structural logic. COSEA can achieve significant improvements over state-of-the-art methods on code search tasks.
arXiv Detail & Related papers (2020-10-19T13:53:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.