Characterizing and Detecting Mismatch in Machine-Learning-Enabled
Systems
- URL: http://arxiv.org/abs/2103.14101v1
- Date: Thu, 25 Mar 2021 19:40:29 GMT
- Title: Characterizing and Detecting Mismatch in Machine-Learning-Enabled
Systems
- Authors: Grace A. Lewis, Stephany Bellomo, Ipek Ozkaya
- Abstract summary: Development and deployment of machine learning systems remains a challenge.
In this paper, we report our findings and their implications for improving end-to-end ML-enabled system development.
- Score: 1.4695979686066065
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Increasing availability of machine learning (ML) frameworks and tools, as
well as their promise to improve solutions to data-driven decision problems,
has resulted in popularity of using ML techniques in software systems. However,
end-to-end development of ML-enabled systems, as well as their seamless
deployment and operations, remain a challenge. One reason is that development
and deployment of ML-enabled systems involves three distinct workflows,
perspectives, and roles, which include data science, software engineering, and
operations. These three distinct perspectives, when misaligned due to incorrect
assumptions, cause ML mismatches which can result in failed systems. We
conducted an interview and survey study where we collected and validated common
types of mismatches that occur in end-to-end development of ML-enabled systems.
Our analysis shows that how each role prioritizes the importance of relevant
mismatches varies, potentially contributing to these mismatched assumptions. In
addition, the mismatch categories we identified can be specified as machine
readable descriptors contributing to improved ML-enabled system development. In
this paper, we report our findings and their implications for improving
end-to-end ML-enabled system development.
Related papers
- Towards Trustworthy Machine Learning in Production: An Overview of the Robustness in MLOps Approach [0.0]
In recent years, AI researchers and practitioners have introduced principles and guidelines to build systems that make reliable and trustworthy decisions.
In practice, a fundamental challenge arises when the system needs to be operationalized and deployed to evolve and operate in real-life environments continuously.
To address this challenge, Machine Learning Operations (MLOps) have emerged as a potential recipe for standardizing ML solutions in deployment.
arXiv Detail & Related papers (2024-10-28T09:34:08Z) - Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks.
However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs.
We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z) - Test & Evaluation Best Practices for Machine Learning-Enabled Systems [7.148282824413932]
Machine learning (ML) based software systems are rapidly gaining adoption across various domains.
This report presents best practices for the Test and Evaluation (T&E) of ML-enabled software systems across its lifecycle.
arXiv Detail & Related papers (2023-10-10T17:11:14Z) - Real-world Machine Learning Systems: A survey from a Data-Oriented
Architecture Perspective [7.574538335342942]
Data-oriented Architecture (DOA) is an emerging concept that equips systems better for integrating ML models.
DOA extends current architectures to create data-driven, loosely coupled, decentralised, open systems.
This paper answers these questions by surveying real-world deployments of ML-based systems.
arXiv Detail & Related papers (2023-02-09T17:57:02Z) - Understanding the Complexity and Its Impact on Testing in ML-Enabled
Systems [8.630445165405606]
We study Rasa 3.0, an industrial dialogue system that has been widely adopted by various companies around the world.
Our goal is to characterize the complexity of such a largescale ML-enabled system and to understand the impact of the complexity on testing.
Our study reveals practical implications for software engineering for ML-enabled systems.
arXiv Detail & Related papers (2023-01-10T08:13:24Z) - Towards Perspective-Based Specification of Machine Learning-Enabled
Systems [1.3406258114080236]
This paper describes our work towards a perspective-based approach for specifying ML-enabled systems.
The approach involves analyzing a set of 45 ML concerns grouped into five perspectives: objectives, user experience, infrastructure, model, and data.
The main contribution of this paper is to provide two new artifacts that can be used to help specifying ML-enabled systems.
arXiv Detail & Related papers (2022-06-20T13:09:23Z) - Practical Machine Learning Safety: A Survey and Primer [81.73857913779534]
Open-world deployment of Machine Learning algorithms in safety-critical applications such as autonomous vehicles needs to address a variety of ML vulnerabilities.
New models and training techniques to reduce generalization error, achieve domain adaptation, and detect outlier examples and adversarial attacks.
Our organization maps state-of-the-art ML techniques to safety strategies in order to enhance the dependability of the ML algorithm from different aspects.
arXiv Detail & Related papers (2021-06-09T05:56:42Z) - Understanding the Usability Challenges of Machine Learning In
High-Stakes Decision Making [67.72855777115772]
Machine learning (ML) is being applied to a diverse and ever-growing set of domains.
In many cases, domain experts -- who often have no expertise in ML or data science -- are asked to use ML predictions to make high-stakes decisions.
We investigate the ML usability challenges present in the domain of child welfare screening through a series of collaborations with child welfare screeners.
arXiv Detail & Related papers (2021-03-02T22:50:45Z) - Leveraging Expert Consistency to Improve Algorithmic Decision Support [62.61153549123407]
We explore the use of historical expert decisions as a rich source of information that can be combined with observed outcomes to narrow the construct gap.
We propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert.
Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap.
arXiv Detail & Related papers (2021-01-24T05:40:29Z) - Technology Readiness Levels for Machine Learning Systems [107.56979560568232]
Development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
We have developed a proven systems engineering approach for machine learning development and deployment.
Our "Machine Learning Technology Readiness Levels" framework defines a principled process to ensure robust, reliable, and responsible systems.
arXiv Detail & Related papers (2021-01-11T15:54:48Z) - Technology Readiness Levels for AI & ML [79.22051549519989]
Development of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
Engineering systems follow well-defined processes and testing standards to streamline development for high-quality, reliable results.
We propose a proven systems engineering approach for machine learning development and deployment.
arXiv Detail & Related papers (2020-06-21T17:14:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.