Quality Issues in Machine Learning Software Systems
- URL: http://arxiv.org/abs/2306.15007v1
- Date: Mon, 26 Jun 2023 18:46:46 GMT
- Title: Quality Issues in Machine Learning Software Systems
- Authors: Pierre-Olivier C\^ot\'e, Amin Nikanjam, Rached Bouchoucha, Ilan Basta,
Mouna Abidi, Foutse Khomh
- Abstract summary: There is a strong need for ensuring the serving quality of Machine Learning Software Systems.
This paper aims to investigate the characteristics of real quality issues in MLSSs from the viewpoint of practitioners.
We identify 18 recurring quality issues and 24 strategies to mitigate them.
- Score: 10.797981721308226
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Context: An increasing demand is observed in various domains to employ
Machine Learning (ML) for solving complex problems. ML models are implemented
as software components and deployed in Machine Learning Software Systems
(MLSSs). Problem: There is a strong need for ensuring the serving quality of
MLSSs. False or poor decisions of such systems can lead to malfunction of other
systems, significant financial losses, or even threats to human life. The
quality assurance of MLSSs is considered a challenging task and currently is a
hot research topic. Objective: This paper aims to investigate the
characteristics of real quality issues in MLSSs from the viewpoint of
practitioners. This empirical study aims to identify a catalog of quality
issues in MLSSs. Method: We conduct a set of interviews with
practitioners/experts, to gather insights about their experience and practices
when dealing with quality issues. We validate the identified quality issues via
a survey with ML practitioners. Results: Based on the content of 37 interviews,
we identified 18 recurring quality issues and 24 strategies to mitigate them.
For each identified issue, we describe the causes and consequences according to
the practitioners' experience. Conclusion: We believe the catalog of issues
developed in this study will allow the community to develop efficient quality
assurance tools for ML models and MLSSs. A replication package of our study is
available on our public GitHub repository.
Related papers
- Automate Knowledge Concept Tagging on Math Questions with LLMs [48.5585921817745]
Knowledge concept tagging for questions plays a crucial role in contemporary intelligent educational applications.
Traditionally, these annotations have been conducted manually with help from pedagogical experts.
In this paper, we explore the automating the tagging task using Large Language Models (LLMs)
arXiv Detail & Related papers (2024-03-26T00:09:38Z) - An Empirical Study of Challenges in Machine Learning Asset Management [15.07444988262748]
Despite existing research, a significant knowledge gap remains in operational challenges like model versioning, data traceability, and collaboration.
Our study aims to address this gap by analyzing 15,065 posts from developer forums and platforms.
We uncover 133 topics related to asset management challenges, grouped into 16 macro-topics, with software dependency, model deployment, and model training being the most discussed.
arXiv Detail & Related papers (2024-02-25T05:05:52Z) - Competition-Level Problems are Effective LLM Evaluators [121.15880285283116]
This paper aims to evaluate the reasoning capacities of large language models (LLMs) in solving recent programming problems in Codeforces.
We first provide a comprehensive evaluation of GPT-4's peiceived zero-shot performance on this task, considering various aspects such as problems' release time, difficulties, and types of errors encountered.
Surprisingly, theThoughtived performance of GPT-4 has experienced a cliff like decline in problems after September 2021 consistently across all the difficulties and types of problems.
arXiv Detail & Related papers (2023-12-04T18:58:57Z) - RECALL: A Benchmark for LLMs Robustness against External Counterfactual
Knowledge [69.79676144482792]
This study aims to evaluate the ability of LLMs to distinguish reliable information from external knowledge.
Our benchmark consists of two tasks, Question Answering and Text Generation, and for each task, we provide models with a context containing counterfactual information.
arXiv Detail & Related papers (2023-11-14T13:24:19Z) - Status Quo and Problems of Requirements Engineering for Machine
Learning: Results from an International Survey [7.164324501049983]
Requirements Engineering (RE) can help address many problems when engineering Machine Learning-enabled systems.
We conducted a survey to gather practitioner insights into the status quo and problems of RE in ML-enabled systems.
We found significant differences in RE practices within ML projects.
arXiv Detail & Related papers (2023-10-10T15:53:50Z) - SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models [70.5763210869525]
We introduce an expansive benchmark suite SciBench for Large Language Model (LLM)
SciBench contains a dataset featuring a range of collegiate-level scientific problems from mathematics, chemistry, and physics domains.
The results reveal that the current LLMs fall short of delivering satisfactory performance, with the best overall score of merely 43.22%.
arXiv Detail & Related papers (2023-07-20T07:01:57Z) - A Survey on Evaluation of Large Language Models [87.60417393701331]
Large language models (LLMs) are gaining increasing popularity in both academia and industry.
This paper focuses on three key dimensions: what to evaluate, where to evaluate, and how to evaluate.
arXiv Detail & Related papers (2023-07-06T16:28:35Z) - Understanding the Issues, Their Causes and Solutions in Microservices
Systems: An Empirical Study [11.536360998310576]
Technical Debt, Continuous Integration, Exception Handling, Service Execution and Communication are the most dominant issues in systems.
We found 177 types of solutions that can be applied to fix the identified issues.
arXiv Detail & Related papers (2023-02-03T18:08:03Z) - Quality issues in Machine Learning Software Systems [12.655311590103238]
This paper aims to investigate the characteristics of real quality issues in MLSSs from the viewpoint of practitioners.
We expect that the catalog of issues developed at this step will also help us later to identify the severity, root causes, and possible remedy for quality issues of MLSSs.
arXiv Detail & Related papers (2022-08-18T17:55:18Z) - Quality Assurance Challenges for Machine Learning Software Applications
During Software Development Life Cycle Phases [1.4213973379473654]
The paper conducts an in-depth review of literature on the quality assurance of Machine Learning (ML) models.
We develop a taxonomy of MLSA quality assurance issues by mapping the various ML adoption challenges across different phases of software development life cycles (SDLC)
This mapping can help prioritize quality assurance efforts of MLSAs where the adoption of ML models can be considered crucial.
arXiv Detail & Related papers (2021-05-03T22:29:23Z) - Understanding the Usability Challenges of Machine Learning In
High-Stakes Decision Making [67.72855777115772]
Machine learning (ML) is being applied to a diverse and ever-growing set of domains.
In many cases, domain experts -- who often have no expertise in ML or data science -- are asked to use ML predictions to make high-stakes decisions.
We investigate the ML usability challenges present in the domain of child welfare screening through a series of collaborations with child welfare screeners.
arXiv Detail & Related papers (2021-03-02T22:50:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.