Code Ownership: The Principles, Differences, and Their Associations with Software Quality
- URL: http://arxiv.org/abs/2408.12807v1
- Date: Fri, 23 Aug 2024 03:01:59 GMT
- Title: Code Ownership: The Principles, Differences, and Their Associations with Software Quality
- Authors: Patanamon Thongtanunam, Chakkrit Tantithamthavorn,
- Abstract summary: We investigate the differences in the commonly used ownership approximations in terms of the set of developers, the approximated code ownership values, and the expertise level.
We find that commit-based and line-based ownership approximations produce different sets of developers, different code ownership values, and different sets of major developers.
- Score: 6.123324869194196
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Code ownership -- an approximation of the degree of ownership of a software component -- is one of the important software measures used in quality improvement plans. However, prior studies proposed different variants of code ownership approximations. Yet, little is known about the difference in code ownership approximations and their association with software quality. In this paper, we investigate the differences in the commonly used ownership approximations (i.e., commit-based and line-based) in terms of the set of developers, the approximated code ownership values, and the expertise level. Then, we analyze the association of each code ownership approximation with the defect-proneness. Through an empirical study of 25 releases that span real-world open-source software systems, we find that commit-based and line-based ownership approximations produce different sets of developers, different code ownership values, and different sets of major developers. In addition, we find that the commit-based approximation has a stronger association with software quality than the line-based approximation. Based on our analysis, we recommend line-based code ownership be used for accountability purposes (e.g., authorship attribution, intellectual property), while commit-based code ownership should be used for rapid bug-fixing and charting quality improvement plans.
Related papers
- OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [70.72097493954067]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems.
While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited.
We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z) - Broken Windows: Exploring the Applicability of a Controversial Theory on Code Quality [13.36825494924134]
We examine whether code history does indeed affect the evolution of code quality.
We check whether developers tailor the quality of their commits based on the quality of the file they commit to.
Our results have implications for both software practice and research.
arXiv Detail & Related papers (2024-10-17T12:16:35Z) - CodeDPO: Aligning Code Models with Self Generated and Verified Source Code [52.70310361822519]
We propose CodeDPO, a framework that integrates preference learning into code generation to improve two key code preference factors: code correctness and efficiency.
CodeDPO employs a novel dataset construction method, utilizing a self-generation-and-validation mechanism that simultaneously generates and evaluates code and test cases.
arXiv Detail & Related papers (2024-10-08T01:36:15Z) - Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs)
The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation.
We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z) - AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data [64.69872638349922]
We present AlchemistCoder, a series of Code LLMs with enhanced code generation and generalization capabilities fine-tuned on multi-source data.
We propose incorporating the data construction process into the fine-tuning data as code comprehension tasks, including instruction evolution, data filtering, and code review.
arXiv Detail & Related papers (2024-05-29T16:57:33Z) - Examining Ownership Models in Software Teams: A Systematic Literature Review and a Replication Study [2.0891120283967264]
We identify 79 relevant papers published between 2005 and 2022.
We develop a taxonomy of ownership artifacts based on type, owners, and degree of ownership.
arXiv Detail & Related papers (2024-05-24T16:03:22Z) - Towards Understanding the Impact of Code Modifications on Software Quality Metrics [1.2277343096128712]
This study aims to assess and interpret the impact of code modifications on software quality metrics.
The underlying hypothesis posits that code modifications inducing similar changes in software quality metrics can be grouped into distinct clusters.
The results reveal distinct clusters of code modifications, each accompanied by a concise description, revealing their collective impact on software quality metrics.
arXiv Detail & Related papers (2024-04-05T08:41:18Z) - Code Revert Prediction with Graph Neural Networks: A Case Study at J.P. Morgan Chase [10.961209762486684]
Code revert prediction aims to forecast or predict the likelihood of code changes being reverted or rolled back in software development.
Previous methods for code defect detection relied on independent features but ignored relationships between code scripts.
This paper presents a systematic empirical study for code revert prediction that integrates the code import graph with code features.
arXiv Detail & Related papers (2024-03-14T15:54:29Z) - Organizational Artifacts of Code Development [10.863006516392831]
We study social effects of country by measuring differences in software repositories associated with different countries.
We propose a novel approach of modeling repositories based on their sequence of development activities as a sequence embedding task.
We conduct a case study on repos from well-known corporations and find that country can describe the differences in development better than the company affiliation itself.
arXiv Detail & Related papers (2021-05-30T22:04:09Z) - A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens.
We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.