A conceptual framework for SPI evaluation
- URL: http://arxiv.org/abs/2307.13089v1
- Date: Mon, 24 Jul 2023 19:22:58 GMT
- Title: A conceptual framework for SPI evaluation
- Authors: Michael Unterkalmsteiner, Tony Gorschek, A. K. M. Moinul Islam, Chow
Kian Cheng, Rahadian Bayu Permadi, Robert Feldt
- Abstract summary: SPI-MEF guides the practitioner in scoping the evaluation, determining measures, and performing the assessment.
SPI-MEF does not assume a specific approach to process improvement and can be integrated in existing measurement programs.
- Score: 6.973622134568803
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Software Process Improvement (SPI) encompasses the analysis and modification
of the processes within software development, aimed at improving key areas that
contribute to the organizations' goals. The task of evaluating whether the
selected improvement path meets these goals is challenging. On the basis of the
results of a systematic literature review on SPI measurement and evaluation
practices, we developed a framework (SPI Measurement and Evaluation Framework
(SPI-MEF)) that supports the planning and implementation of SPI evaluations.
SPI-MEF guides the practitioner in scoping the evaluation, determining
measures, and performing the assessment. SPI-MEF does not assume a specific
approach to process improvement and can be integrated in existing measurement
programs, refocusing the assessment on evaluating the improvement initiative's
outcome. Sixteen industry and academic experts evaluated the framework's
usability and capability to support practitioners, providing additional
insights that were integrated in the application guidelines of the framework.
Related papers
- Med-CoDE: Medical Critique based Disagreement Evaluation Framework [72.42301910238861]
The reliability and accuracy of large language models (LLMs) in medical contexts remain critical concerns.
Current evaluation methods often lack robustness and fail to provide a comprehensive assessment of LLM performance.
We propose Med-CoDE, a specifically designed evaluation framework for medical LLMs to address these challenges.
arXiv Detail & Related papers (2025-04-21T16:51:11Z) - Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework [61.38174427966444]
Large Language Models (LLMs) are being used more and more extensively for automated evaluation in various scenarios.
Previous studies have attempted to fine-tune open-source LLMs to replicate the evaluation explanations and judgments of powerful proprietary models.
We propose a novel evaluation framework, ARJudge, that adaptively formulates evaluation criteria and synthesizes both text-based and code-driven analyses.
arXiv Detail & Related papers (2025-02-26T06:31:45Z) - A Brief Discussion on KPI Development in Public Administration [0.0]
This paper presents an innovative framework for construction within performance evaluation systems, leveraging Random Forest algorithms and variable importance analysis.
The proposed approach identifies key variables that significantly influence PA performance, offering valuable insights into the critical factors driving organizational success.
This study aims to enhance PA performance through the application of machine learning techniques, fostering a more agile and results-driven approach to public administration.
arXiv Detail & Related papers (2024-12-12T10:27:55Z) - Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs [64.9693406713216]
Internal mechanisms that contribute to the effectiveness of RAG systems remain underexplored.
Our experiments reveal that several core groups of experts are primarily responsible for RAG-related behaviors.
We propose several strategies to enhance RAG's efficiency and effectiveness through expert activation.
arXiv Detail & Related papers (2024-10-20T16:08:54Z) - Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework [2.4861619769660637]
We propose an estimands framework adapted from international clinical trials guidelines.
This framework provides a systematic structure for inference and reporting in evaluations.
We demonstrate how the framework can help uncover underlying issues, their causes, and potential solutions.
arXiv Detail & Related papers (2024-06-14T18:47:37Z) - Holistic Safety and Responsibility Evaluations of Advanced AI Models [18.34510620901674]
Safety and responsibility evaluations of advanced AI models are a critical but developing field of research and practice.
In the development of Google DeepMind's advanced AI models, we innovated on and applied a broad set of approaches to safety evaluation.
arXiv Detail & Related papers (2024-04-22T10:26:49Z) - LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models [75.89014602596673]
Strategic reasoning requires understanding and predicting adversary actions in multi-agent settings while adjusting strategies accordingly.
We explore the scopes, applications, methodologies, and evaluation metrics related to strategic reasoning with Large Language Models.
It underscores the importance of strategic reasoning as a critical cognitive capability and offers insights into future research directions and potential improvements.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - Literature Review of Current Sustainability Assessment Frameworks and
Approaches for Organizations [10.045497511868172]
This systematic literature review explores sustainability assessment frameworks (SAFs) across diverse industries.
The review focuses on SAF design approaches including the methods used for Sustainability Indicator (SI) selection, relative importance assessment, and interdependency analysis.
arXiv Detail & Related papers (2024-03-07T18:14:52Z) - AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents [76.95062553043607]
evaluating large language models (LLMs) is essential for understanding their capabilities and facilitating their integration into practical applications.
We introduce AgentBoard, a pioneering comprehensive benchmark and accompanied open-source evaluation framework tailored to analytical evaluation of LLM agents.
arXiv Detail & Related papers (2024-01-24T01:51:00Z) - Evaluating General-Purpose AI with Psychometrics [43.85432514910491]
We discuss the need for a comprehensive and accurate evaluation of general-purpose AI systems such as large language models.
Current evaluation methodology, mostly based on benchmarks of specific tasks, falls short of adequately assessing these versatile AI systems.
To tackle these challenges, we suggest transitioning from task-oriented evaluation to construct-oriented evaluation.
arXiv Detail & Related papers (2023-10-25T05:38:38Z) - Evaluation and Measurement of Software Process Improvement -- A
Systematic Literature Review [6.973622134568803]
Software Process Improvement (SPI) is a systematic approach to increase the efficiency and effectiveness of a software development organization.
This paper aims to identify and characterize evaluation strategies and measurements used to assess the impact of different SPI initiatives.
arXiv Detail & Related papers (2023-07-24T21:51:15Z) - Unifying Gradient Estimators for Meta-Reinforcement Learning via
Off-Policy Evaluation [53.83642844626703]
We provide a unifying framework for estimating higher-order derivatives of value functions, based on off-policy evaluation.
Our framework interprets a number of prior approaches as special cases and elucidates the bias and variance trade-off of Hessian estimates.
arXiv Detail & Related papers (2021-06-24T15:58:01Z) - Evaluating Interactive Summarization: an Expansion-Based Framework [97.0077722128397]
We develop an end-to-end evaluation framework for interactive summarization.
Our framework includes a procedure of collecting real user sessions and evaluation measures relying on standards.
All of our solutions are intended to be released publicly as a benchmark.
arXiv Detail & Related papers (2020-09-17T15:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.