Related papers: Software Code Quality Measurement: Implications from Metric Distributions

Software Code Quality Measurement: Implications from Metric Distributions

URL: http://arxiv.org/abs/2307.12082v4
Date: Tue, 16 Jan 2024 11:32:21 GMT
Title: Software Code Quality Measurement: Implications from Metric Distributions
Authors: Siyuan Jin, Mianmian Zhang, Yekai Guo, Yuejiang He, Ziyuan Li, Bichao Chen, Bing Zhu, and Yong Xia
Abstract summary: We categorized distinct metrics into two types: 1) monotonic metrics that consistently influence code quality; and 2) non-monotonic metrics that lack a consistent relationship with code quality. Our work contributes to the multi-dimensional construct of code quality and its metric measurements, which provides practical implications for consistent measurements on both monotonic and non-monotonic metrics.
Score: 6.110201315596897
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Software code quality is a construct with three dimensions: maintainability, reliability, and functionality. Although many firms have incorporated code quality metrics in their operations, evaluating these metrics still lacks consistent standards. We categorized distinct metrics into two types: 1) monotonic metrics that consistently influence code quality; and 2) non-monotonic metrics that lack a consistent relationship with code quality. To consistently evaluate them, we proposed a distribution-based method to get metric scores. Our empirical analysis includes 36,460 high-quality open-source software (OSS) repositories and their raw metrics from SonarQube and CK. The evaluated scores demonstrate great explainability on software adoption. Our work contributes to the multi-dimensional construct of code quality and its metric measurements, which provides practical implications for consistent measurements on both monotonic and non-monotonic metrics.

Related papers

Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy [52.261323452286554]
We introduce a method for contextual metric meta-evaluation by comparing the local metric accuracy of evaluation metrics. Across translation, speech recognition, and ranking tasks, we demonstrate that the local metric accuracies vary both in absolute value and relative effectiveness as we shift across evaluation contexts.
arXiv Detail & Related papers (2025-03-25T16:42:25Z)
A Review and Collection of Metrics and Benchmarks for Quantum Computers: definitions, methodologies and software [29.981227868010002]
This article provides a review of metrics and benchmarks for quantum computers. It includes a consistent format of the definitions across all metrics and a reproducible approach by linking the metrics to open-source software used to evaluate them. We identify five areas where international standardization working groups could be established.
arXiv Detail & Related papers (2025-02-10T17:48:27Z)
Evaluating Source Code Quality with Large Languagem Models: a comparative study [2.3204178451683264]
This paper describes and analyzes the results obtained using Large Language Model (LLM) as a static analysis tool. A total of 1,641 classes were analyzed, comparing the results in two versions of models: GPT 3.5 Turbo and GPT 4o. The GPT 4o version did not present the same results, diverging from the previous model and Sonar by assigning a high classification to codes that were assessed as lower quality.
arXiv Detail & Related papers (2024-08-07T18:44:46Z)
CodeScore-R: An Automated Robustness Metric for Assessing the FunctionalCorrectness of Code Synthesis [17.747095451792084]
We propose an automated robust metric, called CodeScore-R, for evaluating the functionality of code synthesis. In the tasks of code generation and migration in Java and Python, CodeScore-R outperforms other metrics.
arXiv Detail & Related papers (2024-06-11T02:51:17Z)
Towards Understanding the Impact of Code Modifications on Software Quality Metrics [1.2277343096128712]
This study aims to assess and interpret the impact of code modifications on software quality metrics. The underlying hypothesis posits that code modifications inducing similar changes in software quality metrics can be grouped into distinct clusters. The results reveal distinct clusters of code modifications, each accompanied by a concise description, revealing their collective impact on software quality metrics.
arXiv Detail & Related papers (2024-04-05T08:41:18Z)
Is Reference Necessary in the Evaluation of NLG Systems? When and Where? [58.52957222172377]
We show that reference-free metrics exhibit a higher correlation with human judgment and greater sensitivity to deficiencies in language quality. Our study can provide insight into the appropriate application of automatic metrics and the impact of metric choice on evaluation performance.
arXiv Detail & Related papers (2024-03-21T10:31:11Z)
Free Open Source Communities Sustainability: Does It Make a Difference in Software Quality? [2.981092370528753]
This study aims to empirically explore how the different aspects of sustainability impact software quality. 16 sustainability metrics across four categories were sampled and applied to a set of 217 OSS projects.
arXiv Detail & Related papers (2024-02-10T09:37:44Z)
On the Limitations of Reference-Free Evaluations of Generated Text [64.81682222169113]
We show that reference-free metrics are inherently biased and limited in their ability to evaluate generated text. We argue that they should not be used to measure progress on tasks like machine translation or summarization.
arXiv Detail & Related papers (2022-10-22T22:12:06Z)
The Glass Ceiling of Automatic Evaluation in Natural Language Generation [60.59732704936083]
We take a step back and analyze recent progress by comparing the body of existing automatic metrics and human metrics. Our extensive statistical analysis reveals surprising findings: automatic metrics -- old and new -- are much more similar to each other than to humans.
arXiv Detail & Related papers (2022-08-31T01:13:46Z)
QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization [116.56171113972944]
We show that carefully choosing the components of a QA-based metric is critical to performance. Our solution improves upon the best-performing entailment-based metric and achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-12-16T00:38:35Z)
Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning [66.59455427102152]
We introduce Uncertainty Baselines: high-quality implementations of standard and state-of-the-art deep learning methods on a variety of tasks. Each baseline is a self-contained experiment pipeline with easily reusable and extendable components. We provide model checkpoints, experiment outputs as Python notebooks, and leaderboards for comparing results.
arXiv Detail & Related papers (2021-06-07T23:57:32Z)
GO FIGURE: A Meta Evaluation of Factuality in Summarization [131.1087461486504]
We introduce GO FIGURE, a meta-evaluation framework for evaluating factuality evaluation metrics. Our benchmark analysis on ten factuality metrics reveals that our framework provides a robust and efficient evaluation. It also reveals that while QA metrics generally improve over standard metrics that measure factuality across domains, performance is highly dependent on the way in which questions are generated.
arXiv Detail & Related papers (2020-10-24T08:30:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.