Quality issues in Machine Learning Software Systems
- URL: http://arxiv.org/abs/2208.08982v2
- Date: Mon, 22 Aug 2022 17:43:10 GMT
- Title: Quality issues in Machine Learning Software Systems
- Authors: Pierre-Olivier C\^ot\'e, Amin Nikanjam, Rached Bouchoucha, Foutse
Khomh
- Abstract summary: This paper aims to investigate the characteristics of real quality issues in MLSSs from the viewpoint of practitioners.
We expect that the catalog of issues developed at this step will also help us later to identify the severity, root causes, and possible remedy for quality issues of MLSSs.
- Score: 12.655311590103238
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Context: An increasing demand is observed in various domains to employ
Machine Learning (ML) for solving complex problems. ML models are implemented
as software components and deployed in Machine Learning Software Systems
(MLSSs). Problem: There is a strong need for ensuring the serving quality of
MLSSs. False or poor decisions of such systems can lead to malfunction of other
systems, significant financial losses, or even threat to human life. The
quality assurance of MLSSs is considered as a challenging task and currently is
a hot research topic. Moreover, it is important to cover all various aspects of
the quality in MLSSs. Objective: This paper aims to investigate the
characteristics of real quality issues in MLSSs from the viewpoint of
practitioners. This empirical study aims to identify a catalog of bad-practices
related to poor quality in MLSSs. Method: We plan to conduct a set of
interviews with practitioners/experts, believing that interviews are the best
method to retrieve their experience and practices when dealing with quality
issues. We expect that the catalog of issues developed at this step will also
help us later to identify the severity, root causes, and possible remedy for
quality issues of MLSSs, allowing us to develop efficient quality assurance
tools for ML models and MLSSs.
Related papers
- AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs [53.6200736559742]
AGENT-CQ consists of two stages: a generation stage and an evaluation stage.
CrowdLLM simulates human crowdsourcing judgments to assess generated questions and answers.
Experiments on the ClariQ dataset demonstrate CrowdLLM's effectiveness in evaluating question and answer quality.
arXiv Detail & Related papers (2024-10-25T17:06:27Z) - Understanding the Role of LLMs in Multimodal Evaluation Benchmarks [77.59035801244278]
This paper investigates the role of the Large Language Model (LLM) backbone in Multimodal Large Language Models (MLLMs) evaluation.
Our study encompasses four diverse MLLM benchmarks and eight state-of-the-art MLLMs.
Key findings reveal that some benchmarks allow high performance even without visual inputs and up to 50% of error rates can be attributed to insufficient world knowledge in the LLM backbone.
arXiv Detail & Related papers (2024-10-16T07:49:13Z) - Maintainability Challenges in ML: A Systematic Literature Review [5.669063174637433]
This study aims to identify and synthesise the maintainability challenges in different stages of the Machine Learning workflow.
We screened more than 13000 papers, then selected and qualitatively analysed 56 of them.
arXiv Detail & Related papers (2024-08-17T13:24:15Z) - MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains [54.117238759317004]
Massive Multitask Agent Understanding (MMAU) benchmark features comprehensive offline tasks that eliminate the need for complex environment setups.
It evaluates models across five domains, including Tool-use, Directed Acyclic Graph (DAG) QA, Data Science and Machine Learning coding, Contest-level programming and Mathematics.
With a total of 20 meticulously designed tasks encompassing over 3K distinct prompts, MMAU provides a comprehensive framework for evaluating the strengths and limitations of LLM agents.
arXiv Detail & Related papers (2024-07-18T00:58:41Z) - Competition-Level Problems are Effective LLM Evaluators [121.15880285283116]
This paper aims to evaluate the reasoning capacities of large language models (LLMs) in solving recent programming problems in Codeforces.
We first provide a comprehensive evaluation of GPT-4's peiceived zero-shot performance on this task, considering various aspects such as problems' release time, difficulties, and types of errors encountered.
Surprisingly, theThoughtived performance of GPT-4 has experienced a cliff like decline in problems after September 2021 consistently across all the difficulties and types of problems.
arXiv Detail & Related papers (2023-12-04T18:58:57Z) - Towards Self-Adaptive Machine Learning-Enabled Systems Through QoS-Aware
Model Switching [1.2277343096128712]
We propose the concept of a Machine Learning Model Balancer, focusing on managing uncertainties related to ML models by using multiple models.
AdaMLS is a novel self-adaptation approach that leverages this concept and extends the traditional MAPE-K loop for continuous MLS adaptation.
Preliminary results suggest AdaMLS surpasses naive and single state-of-the-art models in guarantees.
arXiv Detail & Related papers (2023-08-19T09:33:51Z) - A Survey on Evaluation of Large Language Models [87.60417393701331]
Large language models (LLMs) are gaining increasing popularity in both academia and industry.
This paper focuses on three key dimensions: what to evaluate, where to evaluate, and how to evaluate.
arXiv Detail & Related papers (2023-07-06T16:28:35Z) - Quality Issues in Machine Learning Software Systems [10.103134260637402]
There is a strong need for ensuring the serving quality of Machine Learning Software Systems.
This paper aims to investigate the characteristics of real quality issues in MLSSs from the viewpoint of practitioners.
We identify 18 recurring quality issues and 21 strategies to mitigate them.
arXiv Detail & Related papers (2023-06-26T18:46:46Z) - How Can Recommender Systems Benefit from Large Language Models: A Survey [82.06729592294322]
Large language models (LLM) have shown impressive general intelligence and human-like capabilities.
We conduct a comprehensive survey on this research direction from the perspective of the whole pipeline in real-world recommender systems.
arXiv Detail & Related papers (2023-06-09T11:31:50Z) - Quality Assurance Challenges for Machine Learning Software Applications
During Software Development Life Cycle Phases [1.4213973379473654]
The paper conducts an in-depth review of literature on the quality assurance of Machine Learning (ML) models.
We develop a taxonomy of MLSA quality assurance issues by mapping the various ML adoption challenges across different phases of software development life cycles (SDLC)
This mapping can help prioritize quality assurance efforts of MLSAs where the adoption of ML models can be considered crucial.
arXiv Detail & Related papers (2021-05-03T22:29:23Z) - Towards Guidelines for Assessing Qualities of Machine Learning Systems [1.715032913622871]
This article presents the construction of a quality model for an ML system based on an industrial use case.
In the future, we want to learn how the term quality differs between different types of ML systems.
arXiv Detail & Related papers (2020-08-25T13:45:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.