Test & Evaluation Best Practices for Machine Learning-Enabled Systems
- URL: http://arxiv.org/abs/2310.06800v1
- Date: Tue, 10 Oct 2023 17:11:14 GMT
- Title: Test & Evaluation Best Practices for Machine Learning-Enabled Systems
- Authors: Jaganmohan Chandrasekaran, Tyler Cody, Nicola McCarthy, Erin Lanus,
Laura Freeman
- Abstract summary: Machine learning (ML) based software systems are rapidly gaining adoption across various domains.
This report presents best practices for the Test and Evaluation (T&E) of ML-enabled software systems across its lifecycle.
- Score: 7.148282824413932
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning (ML) - based software systems are rapidly gaining adoption
across various domains, making it increasingly essential to ensure they perform
as intended. This report presents best practices for the Test and Evaluation
(T&E) of ML-enabled software systems across its lifecycle. We categorize the
lifecycle of ML-enabled software systems into three stages: component,
integration and deployment, and post-deployment. At the component level, the
primary objective is to test and evaluate the ML model as a standalone
component. Next, in the integration and deployment stage, the goal is to
evaluate an integrated ML-enabled system consisting of both ML and non-ML
components. Finally, once the ML-enabled software system is deployed and
operationalized, the T&E objective is to ensure the system performs as
intended. Maintenance activities for ML-enabled software systems span the
lifecycle and involve maintaining various assets of ML-enabled software
systems.
Given its unique characteristics, the T&E of ML-enabled software systems is
challenging. While significant research has been reported on T&E at the
component level, limited work is reported on T&E in the remaining two stages.
Furthermore, in many cases, there is a lack of systematic T&E strategies
throughout the ML-enabled system's lifecycle. This leads practitioners to
resort to ad-hoc T&E practices, which can undermine user confidence in the
reliability of ML-enabled software systems. New systematic testing approaches,
adequacy measurements, and metrics are required to address the T&E challenges
across all stages of the ML-enabled system lifecycle.
Related papers
- A Large-Scale Study of Model Integration in ML-Enabled Software Systems [4.776073133338119]
Machine learning (ML) and its embedding in systems has drastically changed the engineering of software-intensive systems.
Traditionally, software engineering focuses on manually created artifacts such as source code and the process of creating them.
We present the first large-scale study of real ML-enabled software systems, covering over 2,928 open source systems on GitHub.
arXiv Detail & Related papers (2024-08-12T15:28:40Z) - SEED-Bench-2: Benchmarking Multimodal Large Language Models [67.28089415198338]
Multimodal large language models (MLLMs) have recently demonstrated exceptional capabilities in generating not only texts but also images given interleaved multimodal inputs.
SEED-Bench-2 comprises 24K multiple-choice questions with accurate human annotations, which spans 27 dimensions.
We evaluate the performance of 23 prominent open-source MLLMs and summarize valuable observations.
arXiv Detail & Related papers (2023-11-28T05:53:55Z) - ChEF: A Comprehensive Evaluation Framework for Standardized Assessment
of Multimodal Large Language Models [49.48109472893714]
Multimodal Large Language Models (MLLMs) have shown impressive abilities in interacting with visual content with myriad potential downstream tasks.
We present the first Comprehensive Evaluation Framework (ChEF) that can holistically profile each MLLM and fairly compare different MLLMs.
We will publicly release all the detailed implementations for further analysis, as well as an easy-to-use modular toolkit for the integration of new recipes and models.
arXiv Detail & Related papers (2023-11-05T16:01:40Z) - Vulnerability of Machine Learning Approaches Applied in IoT-based Smart Grid: A Review [51.31851488650698]
Machine learning (ML) sees an increasing prevalence of being used in the internet-of-things (IoT)-based smart grid.
adversarial distortion injected into the power signal will greatly affect the system's normal control and operation.
It is imperative to conduct vulnerability assessment for MLsgAPPs applied in the context of safety-critical power systems.
arXiv Detail & Related papers (2023-08-30T03:29:26Z) - Understanding the Complexity and Its Impact on Testing in ML-Enabled
Systems [8.630445165405606]
We study Rasa 3.0, an industrial dialogue system that has been widely adopted by various companies around the world.
Our goal is to characterize the complexity of such a largescale ML-enabled system and to understand the impact of the complexity on testing.
Our study reveals practical implications for software engineering for ML-enabled systems.
arXiv Detail & Related papers (2023-01-10T08:13:24Z) - SoK: Machine Learning Governance [16.36671448193025]
We develop the concept of ML governance to balance such benefits and risks.
We use identities to hold principals accountable for failures of ML systems.
We highlight the need for techniques that allow a model owner to manage the life cycle of their system.
arXiv Detail & Related papers (2021-09-20T17:56:22Z) - Declarative Machine Learning Systems [7.5717114708721045]
Machine learning (ML) has moved from a academic endeavor to a pervasive technology adopted in almost every aspect of computing.
Recent successes in applying ML in natural sciences revealed that ML can be used to tackle some of the hardest real-world problems humanity faces today.
We believe the next wave of ML systems will allow a larger amount of people, potentially without coding skills, to perform the same tasks.
arXiv Detail & Related papers (2021-07-16T23:57:57Z) - Characterizing and Detecting Mismatch in Machine-Learning-Enabled
Systems [1.4695979686066065]
Development and deployment of machine learning systems remains a challenge.
In this paper, we report our findings and their implications for improving end-to-end ML-enabled system development.
arXiv Detail & Related papers (2021-03-25T19:40:29Z) - White Paper Machine Learning in Certified Systems [70.24215483154184]
DEEL Project set-up the ML Certification 3 Workgroup (WG) set-up by the Institut de Recherche Technologique Saint Exup'ery de Toulouse (IRT)
arXiv Detail & Related papers (2021-03-18T21:14:30Z) - Technology Readiness Levels for Machine Learning Systems [107.56979560568232]
Development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
We have developed a proven systems engineering approach for machine learning development and deployment.
Our "Machine Learning Technology Readiness Levels" framework defines a principled process to ensure robust, reliable, and responsible systems.
arXiv Detail & Related papers (2021-01-11T15:54:48Z) - Technology Readiness Levels for AI & ML [79.22051549519989]
Development of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
Engineering systems follow well-defined processes and testing standards to streamline development for high-quality, reliable results.
We propose a proven systems engineering approach for machine learning development and deployment.
arXiv Detail & Related papers (2020-06-21T17:14:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.