Analyzing DevOps Practices Through Merge Request Data: A Case Study in Networking Software Company
- URL: http://arxiv.org/abs/2503.14677v1
- Date: Tue, 18 Mar 2025 19:33:34 GMT
- Title: Analyzing DevOps Practices Through Merge Request Data: A Case Study in Networking Software Company
- Authors: Samah Kansab, Matthieu Hanania, Francis Bordeleau, Ali Tizghadam,
- Abstract summary: GitLab's Request (MR) mechanism streamlines code submission and review.<n>MR data reflects broader aspects, including collaboration patterns, productivity, and process optimization.<n>This study examines 26.7k MRs from four teams across 116 projects of a networking software company.
- Score: 2.5999037208435705
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: DevOps integrates collaboration, automation, and continuous improvement, enhancing agility, reducing time to market, and ensuring consistent software releases. A key component of this process is GitLab's Merge Request (MR) mechanism, which streamlines code submission and review. Studies have extensively analyzed MR data and similar mechanisms like GitHub pull requests and Gerrit Code Review, focusing on metrics such as review completion time and time to first comment. However, MR data also reflects broader aspects, including collaboration patterns, productivity, and process optimization. This study examines 26.7k MRs from four teams across 116 projects of a networking software company to analyze DevOps processes. We first assess the impact of external factors like COVID-19 and internal changes such as migration to OpenShift. Findings show increased effort and longer MR review times during the pandemic, with stable productivity and a lasting shift to out-of-hours work, reaching 70% of weekly activities. The transition to OpenShift was successful, with stabilized metrics over time. Additionally, we identify prioritization patterns in branch management, particularly in stable branches for new releases, underscoring the importance of workflow efficiency. In code review, while bots accelerate review initiation, human reviewers remain crucial in reducing review completion time. Other factors, such as commit count and reviewer experience, also influence review efficiency. This research provides actionable insights for practitioners, demonstrating how MR data can enhance productivity, effort analysis, and overall efficiency in DevOps.
Related papers
- Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks [229.73714829399802]
This survey probes the core challenges that the rise of Large Language Models poses for evaluation.
We identify and analyze two pivotal transitions: (i) from task-specific to capability-based evaluation, which reorganizes benchmarks around core competencies such as knowledge, reasoning, instruction following, multi-modal understanding, and safety.
We will dissect this issue, along with the core challenges of the above two transitions, from the perspectives of methods, datasets, evaluators, and metrics.
arXiv Detail & Related papers (2025-04-26T07:48:52Z) - LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews [74.87393214734114]
This work introduces LazyReview, a dataset of peer-review sentences annotated with fine-grained lazy thinking categories.
Large Language Models (LLMs) struggle to detect these instances in a zero-shot setting.
instruction-based fine-tuning on our dataset significantly boosts performance by 10-20 performance points.
arXiv Detail & Related papers (2025-04-15T10:07:33Z) - Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection [71.92083784393418]
Inference-time methods such as Best-of-N (BON) sampling offer a simple yet effective alternative to improve performance.
We propose Iterative Agent Decoding (IAD) which combines iterative refinement with dynamic candidate evaluation and selection guided by a verifier.
arXiv Detail & Related papers (2025-04-02T17:40:47Z) - Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.
Our framework incorporates two complementary strategies: internal TTC and external TTC.
We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z) - Deep Learning-based Code Reviews: A Paradigm Shift or a Double-Edged Sword? [14.970843824847956]
We run a controlled experiment with 29 experts who reviewed different programs with/without the support of an automatically generated code review.<n>We show that reviewers consider valid most of the issues automatically identified by the LLM and that the availability of an automated review as a starting point strongly influences their behavior.<n>The reviewers who started from an automated review identified a higher number of low-severity issues while, however, not identifying more high-severity issues as compared to a completely manual process.
arXiv Detail & Related papers (2024-11-18T09:24:01Z) - Prompting and Fine-tuning Large Language Models for Automated Code Review Comment Generation [5.6001617185032595]
Large language models pretrained on both programming and natural language data tend to perform well in code-oriented tasks.
We fine-tune open-source Large language models (LLM) in parameter-efficient, quantized low-rank fashion on consumer-grade hardware to improve review comment generation.
arXiv Detail & Related papers (2024-11-15T12:01:38Z) - Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings [77.20838441870151]
We use an online metric - the number of edits users introduce before committing the generated messages to the VCS - to select metrics for offline experiments.<n>We collect a dataset with 57 pairs consisting of commit messages generated by GPT-4 and their counterparts edited by human experts.<n>Our results indicate that edit distance exhibits the highest correlation with the online metric, whereas commonly used similarity metrics such as BLEU and METEOR demonstrate low correlation.
arXiv Detail & Related papers (2024-10-15T20:32:07Z) - Impact of AI-tooling on the Engineering Workspace [0.0]
Significant changes were observed in coding time fractions among Copilot users.
Some companies experienced a decrease in PR pickup times by up to 33%.
One company experienced a shift of up to 17% of effort from maintenance and support work towards product growth initiatives.
arXiv Detail & Related papers (2024-06-11T20:04:09Z) - What Makes Good Collaborative Views? Contrastive Mutual Information Maximization for Multi-Agent Perception [52.41695608928129]
Multi-agent perception (MAP) allows autonomous systems to understand complex environments by interpreting data from multiple sources.
This paper investigates intermediate collaboration for MAP with a specific focus on exploring "good" properties of collaborative view.
We propose a novel framework named CMiMC for intermediate collaboration.
arXiv Detail & Related papers (2024-03-15T07:18:55Z) - Team-related Features in Code Review Prediction Models [10.576931077314887]
We evaluate the prediction power of features related to code ownership, workload, and team relationship.
Our results show that, individually, features related to code ownership have the best prediction power.
We conclude that all proposed features together with lines of code can make the best predictions for both reviewer participation and amount of feedback.
arXiv Detail & Related papers (2023-12-11T09:30:09Z) - Does Code Review Speed Matter for Practitioners? [0.0]
Increasing code velocity is a common goal for a variety of software projects.
We conducted a survey to study the code velocity-related beliefs and practices in place.
arXiv Detail & Related papers (2023-11-04T19:22:23Z) - Benchopt: Reproducible, efficient and collaborative optimization
benchmarks [67.29240500171532]
Benchopt is a framework to automate, reproduce and publish optimization benchmarks in machine learning.
Benchopt simplifies benchmarking for the community by providing an off-the-shelf tool for running, sharing and extending experiments.
arXiv Detail & Related papers (2022-06-27T16:19:24Z) - QTRAN++: Improved Value Transformation for Cooperative Multi-Agent
Reinforcement Learning [70.382101956278]
QTRAN is a reinforcement learning algorithm capable of learning the largest class of joint-action value functions.
Despite its strong theoretical guarantee, it has shown poor empirical performance in complex environments.
We propose a substantially improved version, coined QTRAN++.
arXiv Detail & Related papers (2020-06-22T05:08:36Z) - A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens.
We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z) - How Useful is Self-Supervised Pretraining for Visual Tasks? [133.1984299177874]
We evaluate various self-supervised algorithms across a comprehensive array of synthetic datasets and downstream tasks.
Our experiments offer insights into how the utility of self-supervision changes as the number of available labels grows.
arXiv Detail & Related papers (2020-03-31T16:03:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.