Related papers: An Empirical Study on the Amount of Changes Required for Merge Request Acceptance

An Empirical Study on the Amount of Changes Required for Merge Request Acceptance

URL: http://arxiv.org/abs/2507.23640v1
Date: Thu, 31 Jul 2025 15:18:46 GMT
Title: An Empirical Study on the Amount of Changes Required for Merge Request Acceptance
Authors: Samah Kansab, Mohammed Sayagh, Francis Bordeleau, Ali Tizghadam,
Abstract summary: Up to 71% of GitLab Requests require adjustments after submission, and 28% involve changes to more than 200 lines of code.<n>We train an interpretable machine learning model using metrics across multiple dimensions: text features, code complexity, developer experience, review history, and branching.<n>Our model achieves strong performance (AUC 0.84-0.88) and reveals that complexity, experience, and text features are key predictors.
Score: 2.5999037208435705
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Code review (CR) is essential to software development, helping ensure that new code is properly integrated. However, the CR process often involves significant effort, including code adjustments, responses to reviewers, and continued implementation. While past studies have examined CR delays and iteration counts, few have investigated the effort based on the volume of code changes required, especially in the context of GitLab Merge Requests (MRs), which remains underexplored. In this paper, we define and measure CR effort as the amount of code modified after submission, using a dataset of over 23,600 MRs from four GitLab projects. We find that up to 71% of MRs require adjustments after submission, and 28% of these involve changes to more than 200 lines of code. Surprisingly, this effort is not correlated with review time or the number of participants. To better understand and predict CR effort, we train an interpretable machine learning model using metrics across multiple dimensions: text features, code complexity, developer experience, review history, and branching. Our model achieves strong performance (AUC 0.84-0.88) and reveals that complexity, experience, and text features are key predictors. Historical project characteristics also influence current review effort. Our findings highlight the feasibility of using machine learning to explain and anticipate the effort needed to integrate code changes during review.

Related papers

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward [50.97588334916863]
We develop CompassVerifier, an accurate and robust lightweight verifier model for evaluation and outcome reward.<n>It demonstrates multi-domain competency spanning math, knowledge, and diverse reasoning tasks, with the capability to process various answer types.<n>We introduce VerifierBench benchmark comprising model outputs collected from multiple data sources, augmented through manual analysis of metaerror patterns to enhance CompassVerifier.
arXiv Detail & Related papers (2025-08-05T17:55:24Z)
Rethinking Code Review Workflows with LLM Assistance: An Empirical Study [2.9593087583214173]
This paper combines an exploratory field study of current code review practices with a field experiment involving two variations of an LLM-assisted code review tool.<n>The study identifies key challenges in traditional code reviews, including frequent context switching and insufficient contextual information.<n>In the field experiment, we developed two prototype variations: one offering LLM-generated reviews upfront and the other enabling on-demand interaction.
arXiv Detail & Related papers (2025-05-22T07:54:07Z)
Is Compression Really Linear with Code Intelligence? [60.123628177110206]
textitFormat Annealing is a lightweight, transparent training methodology designed to assess the intrinsic capabilities of pre-trained models equitably.<n>Our empirical results reveal a fundamental logarithmic relationship between measured code intelligence and bits-per-character (BPC)<n>Our work provides a more nuanced understanding of compression's role in developing code intelligence and contributes a robust evaluation framework in the code domain.
arXiv Detail & Related papers (2025-05-16T16:59:14Z)
Analyzing DevOps Practices Through Merge Request Data: A Case Study in Networking Software Company [2.5999037208435705]
GitLab's Request (MR) mechanism streamlines code submission and review.<n>MR data reflects broader aspects, including collaboration patterns, productivity, and process optimization.<n>This study examines 26.7k MRs from four teams across 116 projects of a networking software company.
arXiv Detail & Related papers (2025-03-18T19:33:34Z)
Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub. 83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z)
Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings [77.20838441870151]
We use an online metric - the number of edits users introduce before committing the generated messages to the VCS - to select metrics for offline experiments.<n>We collect a dataset with 57 pairs consisting of commit messages generated by GPT-4 and their counterparts edited by human experts.<n>Our results indicate that edit distance exhibits the highest correlation with the online metric, whereas commonly used similarity metrics such as BLEU and METEOR demonstrate low correlation.
arXiv Detail & Related papers (2024-10-15T20:32:07Z)
Let the Code LLM Edit Itself When You Edit the Code [50.46536185784169]
underlinetextbfPositional textbfIntegrity textbfEncoding (PIE)<n>PIE reduces computational overhead by over 85% compared to the standard full recomputation approach.<n>Results demonstrate that PIE reduces computational overhead by over 85% compared to the standard full recomputation approach.
arXiv Detail & Related papers (2024-07-03T14:34:03Z)
CoIR: A Comprehensive Benchmark for Code Information Retrieval Models [52.61625841028781]
COIR (Code Information Retrieval Benchmark) is a robust and comprehensive benchmark designed to assess code retrieval capabilities.<n>COIR comprises ten meticulously curated code datasets, spanning eight distinctive retrieval tasks across seven diverse domains.<n>We evaluate nine widely used retrieval models using COIR, uncovering significant difficulties in performing code retrieval tasks even with state-of-the-art systems.
arXiv Detail & Related papers (2024-07-03T07:58:20Z)
VersiCode: Towards Version-controllable Code Generation [58.82709231906735]
Large Language Models (LLMs) have made tremendous strides in code generation, but existing research fails to account for the dynamic nature of software development. We propose two novel tasks aimed at bridging this gap: version-specific code completion (VSCC) and version-aware code migration (VACM) We conduct an extensive evaluation on VersiCode, which reveals that version-controllable code generation is indeed a significant challenge.
arXiv Detail & Related papers (2024-06-11T16:15:06Z)
Code Reviewer Recommendation Based on a Hypergraph with Multiplex Relationships [30.74556500021384]
We present MIRRec, a novel code reviewer recommendation method that leverages a hypergraph with multiplex relationships. MIRRec encodes high-order correlations that go beyond traditional pairwise connections using degree-free hyperedges among pull requests and developers. To validate the effectiveness of MIRRec, we conducted experiments using a dataset comprising 48,374 pull requests from ten popular open-source software projects hosted on GitHub.
arXiv Detail & Related papers (2024-01-19T15:25:14Z)
What Makes a Code Review Useful to OpenDev Developers? An Empirical Investigation [4.061135251278187]
Even a minor improvement in the effectiveness of Code Reviews can incur significant savings for a software development organization. This study aims to develop a finer grain understanding of what makes a code review comment useful to OSS developers.
arXiv Detail & Related papers (2023-02-22T22:48:27Z)
Predicting Code Review Completion Time in Modern Code Review [12.696276129130332]
Modern Code Review (MCR) is being adopted in both open source and commercial projects as a common practice. Code reviews can experience significant delays to be completed due to various socio-technical factors. There is a lack of tool support to help developers estimating the time required to complete a code review.
arXiv Detail & Related papers (2021-09-30T14:00:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.