Related papers: Automated Code Review Assignments: An Alternative Perspective of Code Ownership on GitHub

Automated Code Review Assignments: An Alternative Perspective of Code Ownership on GitHub

URL: http://arxiv.org/abs/2512.05551v1
Date: Fri, 05 Dec 2025 09:14:22 GMT
Title: Automated Code Review Assignments: An Alternative Perspective of Code Ownership on GitHub
Authors: Jai Lal Lulla, Raula Gaikovina Kula, Christoph Treude,
Abstract summary: GitHub introduced the CODEOWNERS feature, which automatically designates reviewers for specific files.<n>We present the first large-scale empirical study of CODEOWNERS usage across over 844,000 pull requests with 1.9 million comments and over 2 million reviews.<n>Results indicate that codeowners tend to adhere the rules specified in the CODEOWNERS file, exhibit similar collaborative behaviours to traditional metrics of ownership, but tend to contribute to a smoother and faster PR workflow over time.
Score: 9.824540566919184
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Code ownership is central to ensuring accountability and maintaining quality in large-scale software development. Yet, as external threats such as software supply chain attacks on project health and quality assurance increase, mechanisms for assigning and enforcing responsibility have become increasingly critical. In 2017, GitHub introduced the CODEOWNERS feature, which automatically designates reviewers for specific files to strengthen accountability and protect critical parts of the codebase. Despite its potential, little is known about how CODEOWNERS is actually adopted and practiced. We present the first large-scale empirical study of CODEOWNERS usage across over 844,000 pull requests with 1.9 million comments and over 2 million reviews. We identify 10,287 code owners to track their review activities. Results indicate that codeowners tend to adhere the rules specified in the CODEOWNERS file, exhibit similar collaborative behaviours to traditional metrics of ownership, but tend to contribute to a smoother and faster PR workflow over time. Finally, using regression discontinuity design (RDD) analysis, we find that repositories adopting CODEOWNERS experience shifts in review dynamics, as ownership redistributes review responsibilities away from core developers. Our results position CODEOWNERS as a promising yet underutilized mechanism for improving software governance and resilience. We discuss how projects can leverage this alternative ownership method as a perspective to enhance security, accountability, and workflow efficiency in open-source development.

Related papers

FeatureBench: Benchmarking Agentic Coding for Complex Feature Development [42.26354337364403]
FeatureBench is a benchmark designed to evaluate agentic coding performance in end-to-end, feature-oriented software development.<n>It incorporates an execution-based evaluation protocol and a scalable test-driven method that automatically derives tasks from code repositories with minimal human effort.<n> Empirical evaluation reveals that the state-of-the-art agentic model, such as Claude 4.5 Opus, achieves a 74.4% resolved rate on SWE-bench.
arXiv Detail & Related papers (2026-02-11T16:06:32Z)
Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model [60.60587869092729]
Large language models (LLMs) are increasingly used in software development, yet their tendency to generate insecure code remains a major barrier to real-world deployment.<n>We propose SecCoderX, an online reinforcement learning framework for functionality-preserving secure code generation.
arXiv Detail & Related papers (2026-02-07T07:42:07Z)
ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development [72.4729759618632]
We introduce ABC-Bench, a benchmark to evaluate agentic backend coding within a realistic, executable workflow.<n>We curated 224 practical tasks spanning 8 languages and 19 frameworks from open-source repositories.<n>Our evaluation reveals that even state-of-the-art models struggle to deliver reliable performance on these holistic tasks.
arXiv Detail & Related papers (2026-01-16T08:23:52Z)
NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents [79.29376673236142]
Existing benchmarks fail to rigorously evaluate the long-horizon capabilities required to build complete software systems.<n>We present NL2Repo Bench, a benchmark explicitly designed to evaluate the long-horizon repository generation ability of coding agents.
arXiv Detail & Related papers (2025-12-14T15:12:13Z)
GitHub's Copilot Code Review: Can AI Spot Security Flaws Before You Commit? [0.0]
This study evaluates the effectiveness of GitHub Copilot's recently introduced code review feature in detecting security vulnerabilities.<n>Contrary to expectations, our results reveal that Copilot's code review frequently fails to detect critical vulnerabilities.<n>Our results highlight the continued necessity of dedicated security tools and manual code audits to ensure robust software security.
arXiv Detail & Related papers (2025-09-17T02:56:21Z)
Wolves in the Repository: A Software Engineering Analysis of the XZ Utils Supply Chain Attack [0.8517406772939294]
The digital economy runs on Open Source Software (OSS), with an estimated 90% of modern applications containing open-source components.<n>This paper examines a sophisticated attack on the XZUtils project (-2024-3094), where attackers exploited not just code, but the entire open-source development process.<n>Our analysis reveals a new breed of supply chain attack that manipulates software engineering practices themselves.
arXiv Detail & Related papers (2025-04-24T12:06:11Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
Bugdar: AI-Augmented Secure Code Review for GitHub Pull Requests [9.636894100495505]
Bugdar is an AI-augmented code review system that integrates seamlessly into GitHub pull requests.<n>It provides near real-time, context-aware vulnerability analysis.<n>Bugdar processes an average of 56.4 seconds per pull request or 30 lines of code per second.
arXiv Detail & Related papers (2025-03-21T16:52:03Z)
RedCode: Risky Code Execution and Generation Benchmark for Code Agents [50.81206098588923]
RedCode is a benchmark for risky code execution and generation. RedCode-Exec provides challenging prompts that could lead to risky code execution. RedCode-Gen provides 160 prompts with function signatures and docstrings as input to assess whether code agents will follow instructions.
arXiv Detail & Related papers (2024-11-12T13:30:06Z)
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z)
Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs) The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation. We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z)
Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration [64.19431011897515]
This paper presents Alibaba LingmaAgent, a novel Automated Software Engineering method designed to comprehensively understand and utilize whole software repositories for issue resolution.<n>Our approach introduces a top-down method to condense critical repository information into a knowledge graph, reducing complexity, and employs a Monte Carlo tree search based strategy.<n>In production deployment and evaluation at Alibaba Cloud, LingmaAgent automatically resolved 16.9% of in-house issues faced by development engineers, and solved 43.3% of problems after manual intervention.
arXiv Detail & Related papers (2024-06-03T15:20:06Z)
LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward [3.729516018513228]
We introduce a multipurpose code vulnerability analysis system textttSecRepair, powered by a large language model, CodeGen2. Inspired by how humans fix code issues, we propose an instruction-based dataset suitable for vulnerability analysis with LLMs. We identify zero-day and N-day vulnerabilities in 6 Open Source IoT Operating Systems on GitHub.
arXiv Detail & Related papers (2024-01-07T02:46:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.