Evaluation and Measurement of Software Process Improvement -- A
Systematic Literature Review
- URL: http://arxiv.org/abs/2307.13143v1
- Date: Mon, 24 Jul 2023 21:51:15 GMT
- Title: Evaluation and Measurement of Software Process Improvement -- A
Systematic Literature Review
- Authors: Michael Unterkalmsteiner, Tony Gorschek, A. K. M. Moinul Islam, Chow
Kian Cheng, Rahadian Bayu Permadi, Robert Feldt
- Abstract summary: Software Process Improvement (SPI) is a systematic approach to increase the efficiency and effectiveness of a software development organization.
This paper aims to identify and characterize evaluation strategies and measurements used to assess the impact of different SPI initiatives.
- Score: 6.973622134568803
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: BACKGROUND: Software Process Improvement (SPI) is a systematic approach to
increase the efficiency and effectiveness of a software development
organization and to enhance software products. OBJECTIVE: This paper aims to
identify and characterize evaluation strategies and measurements used to assess
the impact of different SPI initiatives. METHOD: The systematic literature
review includes 148 papers published between 1991 and 2008. The selected papers
were classified according to SPI initiative, applied evaluation strategies, and
measurement perspectives. Potential confounding factors interfering with the
evaluation of the improvement effort were assessed. RESULTS: Seven distinct
evaluation strategies were identified, wherein the most common one, "Pre-Post
Comparison" was applied in 49 percent of the inspected papers. Quality was the
most measured attribute (62 percent), followed by Cost (41 percent), and
Schedule (18 percent). Looking at measurement perspectives, "Project"
represents the majority with 66 percent. CONCLUSION: The evaluation validity of
SPI initiatives is challenged by the scarce consideration of potential
confounding factors, particularly given that "Pre-Post Comparison" was
identified as the most common evaluation strategy, and the inaccurate
descriptions of the evaluation context. Measurements to assess the short and
mid-term impact of SPI initiatives prevail, whereas long-term measurements in
terms of customer satisfaction and return on investment tend to be less used.
Related papers
- Identifying Aspects in Peer Reviews [61.374437855024844]
We develop a data-driven schema for deriving fine-grained aspects from a corpus of peer reviews.
We introduce a dataset of peer reviews augmented with aspects and show how it can be used for community-level review analysis.
arXiv Detail & Related papers (2025-04-09T14:14:42Z) - HPSS: Heuristic Prompting Strategy Search for LLM Evaluators [81.09765876000208]
We propose a novel automatic prompting strategy optimization method called Heuristic Prompting Strategy Search (HPSS)
Inspired by the genetic algorithm, HPSS conducts an iterative search to find well-behaved prompting strategies for evaluators.
Extensive experiments across four evaluation tasks demonstrate the effectiveness of HPSS.
arXiv Detail & Related papers (2025-02-18T16:46:47Z) - Quantifying User Coherence: A Unified Framework for Cross-Domain Recommendation Analysis [69.37718774071793]
This paper introduces novel information-theoretic measures for understanding recommender systems.
We evaluate 7 recommendation algorithms across 9 datasets, revealing the relationships between our measures and standard performance metrics.
arXiv Detail & Related papers (2024-10-03T13:02:07Z) - A Critical Look at Meta-evaluating Summarisation Evaluation Metrics [11.541368732416506]
We argue that the time is ripe to build more diverse benchmarks that enable the development of more robust evaluation metrics.
We call for research focusing on user-centric quality dimensions that consider the generated summary's communicative goal.
arXiv Detail & Related papers (2024-09-29T01:30:13Z) - Learning Outcomes, Assessment, and Evaluation in Educational Recommender Systems: A Systematic Review [0.0]
We analyse how learning is measured and optimized in Educational Recommender Systems (ERS)
Rating-based relevance is the most popular target metric, while less than a half of papers optimize learning-based metrics.
Only a third of the papers used outcome-based assessment to measure the pedagogical effect of recommendations.
arXiv Detail & Related papers (2024-06-12T21:53:46Z) - Evaluation in Neural Style Transfer: A Review [0.7614628596146599]
We provide an in-depth analysis of existing evaluation techniques, identify the inconsistencies and limitations of current evaluation methods, and give recommendations for standardized evaluation practices.
We believe that the development of a robust evaluation framework will not only enable more meaningful and fairer comparisons but will also enhance the comprehension and interpretation of research findings in the field.
arXiv Detail & Related papers (2024-01-30T15:45:30Z) - F-Eval: Assessing Fundamental Abilities with Refined Evaluation Methods [102.98899881389211]
We propose F-Eval, a bilingual evaluation benchmark to evaluate the fundamental abilities, including expression, commonsense and logic.
For reference-free subjective tasks, we devise new evaluation methods, serving as alternatives to scoring by API models.
arXiv Detail & Related papers (2024-01-26T13:55:32Z) - A conceptual framework for SPI evaluation [6.973622134568803]
SPI-MEF guides the practitioner in scoping the evaluation, determining measures, and performing the assessment.
SPI-MEF does not assume a specific approach to process improvement and can be integrated in existing measurement programs.
arXiv Detail & Related papers (2023-07-24T19:22:58Z) - From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing.
This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time.
We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Revisiting the Gold Standard: Grounding Summarization Evaluation with
Robust Human Evaluation [136.16507050034755]
Existing human evaluation studies for summarization either exhibit a low inter-annotator agreement or have insufficient scale.
We propose a modified summarization salience protocol, Atomic Content Units (ACUs), which is based on fine-grained semantic units.
We curate the Robust Summarization Evaluation (RoSE) benchmark, a large human evaluation dataset consisting of 22,000 summary-level annotations over 28 top-performing systems.
arXiv Detail & Related papers (2022-12-15T17:26:05Z) - Evaluating the Predictive Performance of Positive-Unlabelled
Classifiers: a brief critical review and practical recommendations for
improvement [77.34726150561087]
Positive-Unlabelled (PU) learning is a growing area of machine learning.
This paper critically reviews the main PU learning evaluation approaches and the choice of predictive accuracy measures in 51 articles proposing PU classifiers.
arXiv Detail & Related papers (2022-06-06T08:31:49Z) - Ranking Scientific Papers Using Preference Learning [48.78161994501516]
We cast it as a paper ranking problem based on peer review texts and reviewer scores.
We introduce a novel, multi-faceted generic evaluation framework for making final decisions based on peer reviews.
arXiv Detail & Related papers (2021-09-02T19:41:47Z) - Impacts Towards a comprehensive assessment of the book impact by
integrating multiple evaluation sources [6.568523667580746]
This paper measures book impact based on an evaluation system constructed by integrating multiple evaluation sources.
Various technologies (e.g. topic extraction, sentiment analysis, text classification) were used to extract corresponding evaluation metrics.
The reliability of the evaluation system was verified by comparing with the results of expert evaluation.
arXiv Detail & Related papers (2021-07-22T03:11:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.