Using Quality Attribute Scenarios for ML Model Test Case Generation
- URL: http://arxiv.org/abs/2406.08575v1
- Date: Wed, 12 Jun 2024 18:26:42 GMT
- Title: Using Quality Attribute Scenarios for ML Model Test Case Generation
- Authors: Rachel Brower-Sinning, Grace A. Lewis, SebastÃan EcheverrÃa, Ipek Ozkaya,
- Abstract summary: Current practice for machine learning (ML) model testing prioritizes testing for model performance.
This paper presents an approach based on quality attribute (QA) scenarios to elicit and define system- and model-relevant test cases.
The QA-based approach has been integrated into MLTE, a process and tool to support ML model test and evaluation.
- Score: 3.9111051646728527
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Testing of machine learning (ML) models is a known challenge identified by researchers and practitioners alike. Unfortunately, current practice for ML model testing prioritizes testing for model performance, while often neglecting the requirements and constraints of the ML-enabled system that integrates the model. This limited view of testing leads to failures during integration, deployment, and operations, contributing to the difficulties of moving models from development to production. This paper presents an approach based on quality attribute (QA) scenarios to elicit and define system- and model-relevant test cases for ML models. The QA-based approach described in this paper has been integrated into MLTE, a process and tool to support ML model test and evaluation. Feedback from users of MLTE highlights its effectiveness in testing beyond model performance and identifying failures early in the development process.
Related papers
- Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [85.51252685938564]
Uncertainty quantification (UQ) is becoming increasingly recognized as a critical component of applications that rely on machine learning (ML)
As with other ML models, large language models (LLMs) are prone to make incorrect predictions, hallucinate'' by fabricating claims, or simply generate low-quality output for a given input.
We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines, and provides an environment for controllable and consistent evaluation of novel techniques.
arXiv Detail & Related papers (2024-06-21T20:06:31Z) - Outline of an Independent Systematic Blackbox Test for ML-based Systems [0.0]
This article proposes a test procedure that can be used to test ML models and ML-based systems independently of the actual training process.
In this way, the typical quality statements such as accuracy and precision of these models and systems can be verified independently.
arXiv Detail & Related papers (2024-01-30T14:41:28Z) - Test Generation Strategies for Building Failure Models and Explaining
Spurious Failures [4.995172162560306]
Test inputs fail not only when the system under test is faulty but also when the inputs are invalid or unrealistic.
We propose to build failure models for inferring interpretable rules on test inputs that cause spurious failures.
We show that our proposed surrogate-assisted approach generates failure models with an average accuracy of 83%.
arXiv Detail & Related papers (2023-12-09T18:36:15Z) - Continuous Management of Machine Learning-Based Application Behavior [3.316045828362788]
Non-functional properties of Machine Learning models must be monitored, verified, and maintained.
We propose a multi-model approach that aims to guarantee a stable non-functional behavior of ML-based applications.
We experimentally evaluate our solution in a real-world scenario focusing on non-functional property fairness.
arXiv Detail & Related papers (2023-11-21T15:47:06Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Learning continuous models for continuous physics [94.42705784823997]
We develop a test based on numerical analysis theory to validate machine learning models for science and engineering applications.
Our results illustrate how principled numerical analysis methods can be coupled with existing ML training/testing methodologies to validate models for science and engineering applications.
arXiv Detail & Related papers (2022-02-17T07:56:46Z) - Active Surrogate Estimators: An Active Learning Approach to
Label-Efficient Model Evaluation [59.7305309038676]
We propose Active Surrogate Estimators (ASEs) for model evaluation.
We find that ASEs offer greater label-efficiency than the current state-of-the-art.
arXiv Detail & Related papers (2022-02-14T17:15:18Z) - Mutation Testing framework for Machine Learning [0.0]
Failure of Machine Learning Models can lead to severe consequences in terms of loss of life or property.
Developers, scientists, and ML community around the world, must build a highly reliable test architecture for critical ML application.
This article provides an insight journey of Machine Learning Systems (MLS) testing, its evolution, current paradigm and future work.
arXiv Detail & Related papers (2021-02-19T18:02:31Z) - DirectDebug: Automated Testing and Debugging of Feature Models [55.41644538483948]
Variability models (e.g., feature models) are a common way for the representation of variabilities and commonalities of software artifacts.
Complex and often large-scale feature models can become faulty, i.e., do not represent the expected variability properties of the underlying software artifact.
arXiv Detail & Related papers (2021-02-11T11:22:20Z) - Transfer Learning without Knowing: Reprogramming Black-box Machine
Learning Models with Scarce Data and Limited Resources [78.72922528736011]
We propose a novel approach, black-box adversarial reprogramming (BAR), that repurposes a well-trained black-box machine learning model.
Using zeroth order optimization and multi-label mapping techniques, BAR can reprogram a black-box ML model solely based on its input-output responses.
BAR outperforms state-of-the-art methods and yields comparable performance to the vanilla adversarial reprogramming method.
arXiv Detail & Related papers (2020-07-17T01:52:34Z) - Testing Monotonicity of Machine Learning Models [0.5330240017302619]
We propose verification-based testing of monotonicity, i.e., the formal computation of test inputs on a white-box model via verification technology.
On the white-box model, the space of test inputs can be systematically explored by a directed computation of test cases.
The empirical evaluation on 90 black-box models shows verification-based testing can outperform adaptive random testing as well as property-based techniques with respect to effectiveness and efficiency.
arXiv Detail & Related papers (2020-02-27T17:38:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.