Related papers: Testing Monotonicity of Machine Learning Models

Testing Monotonicity of Machine Learning Models

URL: http://arxiv.org/abs/2002.12278v1
Date: Thu, 27 Feb 2020 17:38:06 GMT
Title: Testing Monotonicity of Machine Learning Models
Authors: Arnab Sharma and Heike Wehrheim
Abstract summary: We propose verification-based testing of monotonicity, i.e., the formal computation of test inputs on a white-box model via verification technology. On the white-box model, the space of test inputs can be systematically explored by a directed computation of test cases. The empirical evaluation on 90 black-box models shows verification-based testing can outperform adaptive random testing as well as property-based techniques with respect to effectiveness and efficiency.
Score: 0.5330240017302619
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Today, machine learning (ML) models are increasingly applied in decision making. This induces an urgent need for quality assurance of ML models with respect to (often domain-dependent) requirements. Monotonicity is one such requirement. It specifies a software as 'learned' by an ML algorithm to give an increasing prediction with the increase of some attribute values. While there exist multiple ML algorithms for ensuring monotonicity of the generated model, approaches for checking monotonicity, in particular of black-box models, are largely lacking. In this work, we propose verification-based testing of monotonicity, i.e., the formal computation of test inputs on a white-box model via verification technology, and the automatic inference of this approximating white-box model from the black-box model under test. On the white-box model, the space of test inputs can be systematically explored by a directed computation of test cases. The empirical evaluation on 90 black-box models shows verification-based testing can outperform adaptive random testing as well as property-based techniques with respect to effectiveness and efficiency.

Related papers

Predicting the Performance of Black-box LLMs through Self-Queries [60.87193950962585]
Large language models (LLMs) are increasingly relied on in AI systems, predicting when they make mistakes is crucial. In this paper, we extract features of LLMs in a black-box manner by using follow-up prompts and taking the probabilities of different responses as representations. We demonstrate that training a linear model on these low-dimensional representations produces reliable predictors of model performance at the instance level.
arXiv Detail & Related papers (2025-01-02T22:26:54Z)
Context-Aware Testing: A New Paradigm for Model Testing with Large Language Models [49.06068319380296]
We introduce context-aware testing (CAT) which uses context as an inductive bias to guide the search for meaningful model failures. We instantiate the first CAT system, SMART Testing, which employs large language models to hypothesize relevant and likely failures.
arXiv Detail & Related papers (2024-10-31T15:06:16Z)
Using Quality Attribute Scenarios for ML Model Test Case Generation [3.9111051646728527]
Current practice for machine learning (ML) model testing prioritizes testing for model performance. This paper presents an approach based on quality attribute (QA) scenarios to elicit and define system- and model-relevant test cases. The QA-based approach has been integrated into MLTE, a process and tool to support ML model test and evaluation.
arXiv Detail & Related papers (2024-06-12T18:26:42Z)
DREAM: Domain-free Reverse Engineering Attributes of Black-box Model [51.37041886352823]
We propose a new problem of Domain-agnostic Reverse Engineering the Attributes of a black-box target model. We learn a domain-agnostic model to infer the attributes of a target black-box model with unknown training data.
arXiv Detail & Related papers (2023-07-20T16:25:58Z)
Zero-shot Model Diagnosis [80.36063332820568]
A common approach to evaluate deep learning models is to build a labeled test set with attributes of interest and assess how well it performs. This paper argues the case that Zero-shot Model Diagnosis (ZOOM) is possible without the need for a test set nor labeling.
arXiv Detail & Related papers (2023-03-27T17:59:33Z)
Learning to Increase the Power of Conditional Randomization Tests [8.883733362171032]
The model-X conditional randomization test is a generic framework for conditional independence testing. We introduce novel model-fitting schemes that are designed to explicitly improve the power of model-X tests.
arXiv Detail & Related papers (2022-07-03T12:29:25Z)
White-box Testing of NLP models with Mask Neuron Coverage [30.508750085817717]
We propose a set of white-box testing methods that are customized for transformer-based NLP models. MNCOVER measures how thoroughly the attention layers in models are exercised during testing. We show how MNCOVER can be used to guide CheckList input generation, evaluate alternative NLP testing methods, and drive data augmentation to improve accuracy.
arXiv Detail & Related papers (2022-05-10T17:07:23Z)
Learning continuous models for continuous physics [94.42705784823997]
We develop a test based on numerical analysis theory to validate machine learning models for science and engineering applications. Our results illustrate how principled numerical analysis methods can be coupled with existing ML training/testing methodologies to validate models for science and engineering applications.
arXiv Detail & Related papers (2022-02-17T07:56:46Z)
Data Synthesis for Testing Black-Box Machine Learning Models [2.3800397174740984]
The increasing usage of machine learning models raises the question of the reliability of these models. In this paper, we provide a framework for automated test data synthesis to test black-box ML/DL models.
arXiv Detail & Related papers (2021-11-03T12:00:30Z)
ML4ML: Automated Invariance Testing for Machine Learning Models [7.017320068977301]
We propose an automatic testing framework that is applicable to a variety of invariance qualities. We employ machine learning techniques for analysing such imagery'' testing data automatically, hence facilitating ML4ML. Our testing results show that the trained ML4ML assessors can perform such analytical tasks with sufficient accuracy.
arXiv Detail & Related papers (2021-09-27T10:23:44Z)
Hessian-based toolbox for reliable and interpretable machine learning in physics [58.720142291102135]
We present a toolbox for interpretability and reliability, extrapolation of the model architecture. It provides a notion of the influence of the input data on the prediction at a given test point, an estimation of the uncertainty of the model predictions, and an agnostic score for the model predictions. Our work opens the road to the systematic use of interpretability and reliability methods in ML applied to physics and, more generally, science.
arXiv Detail & Related papers (2021-08-04T16:32:59Z)
Design of Dynamic Experiments for Black-Box Model Discrimination [72.2414939419588]
Consider a dynamic model discrimination setting where we wish to chose: (i) what is the best mechanistic, time-varying model and (ii) what are the best model parameter estimates. For rival mechanistic models where we have access to gradient information, we extend existing methods to incorporate a wider range of problem uncertainty. We replace these black-box models with Gaussian process surrogate models and thereby extend the model discrimination setting to additionally incorporate rival black-box model.
arXiv Detail & Related papers (2021-02-07T11:34:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.