Using Metamorphic Relations to Verify and Enhance Artcode Classification
- URL: http://arxiv.org/abs/2108.02694v1
- Date: Thu, 5 Aug 2021 15:54:56 GMT
- Title: Using Metamorphic Relations to Verify and Enhance Artcode Classification
- Authors: Liming Xu, Dave Towey, Andrew French, Steve Benford, Zhi Quan Zhou and
Tsong Yueh Chen
- Abstract summary: An example of an area facing the oracle problem is automatic image classification, using machine learning to classify an input image as one of a set of predefined classes.
An approach to software testing that alleviates the oracle problem is metamorphic testing (MT)
This paper examines the problem of classifying images containing visually hidden markers called Artcodes, and applies MT to verify and enhance the trained classifiers.
- Score: 39.36253474867746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Software testing is often hindered where it is impossible or impractical to
determine the correctness of the behaviour or output of the software under test
(SUT), a situation known as the oracle problem. An example of an area facing
the oracle problem is automatic image classification, using machine learning to
classify an input image as one of a set of predefined classes. An approach to
software testing that alleviates the oracle problem is metamorphic testing
(MT). While traditional software testing examines the correctness of individual
test cases, MT instead examines the relations amongst multiple executions of
test cases and their outputs. These relations are called metamorphic relations
(MRs): if an MR is found to be violated, then a fault must exist in the SUT.
This paper examines the problem of classifying images containing visually
hidden markers called Artcodes, and applies MT to verify and enhance the
trained classifiers. This paper further examines two MRs, Separation and
Occlusion, and reports on their capability in verifying the image
classification using one-way analysis of variance (ANOVA) in conjunction with
three other statistical analysis methods: t-test (for unequal variances),
Kruskal-Wallis test, and Dunnett's test. In addition to our previously-studied
classifier, that used Random Forests, we introduce a new classifier that uses a
support vector machine, and present its MR-augmented version. Experimental
evaluations across a number of performance metrics show that the augmented
classifiers can achieve better performance than non-augmented classifiers. This
paper also analyses how the enhanced performance is obtained.
Related papers
- Deep anytime-valid hypothesis testing [29.273915933729057]
We propose a general framework for constructing powerful, sequential hypothesis tests for nonparametric testing problems.
We develop a principled approach of leveraging the representation capability of machine learning models within the testing-by-betting framework.
Empirical results on synthetic and real-world datasets demonstrate that tests instantiated using our general framework are competitive against specialized baselines.
arXiv Detail & Related papers (2023-10-30T09:46:19Z) - Zero-shot Model Diagnosis [80.36063332820568]
A common approach to evaluate deep learning models is to build a labeled test set with attributes of interest and assess how well it performs.
This paper argues the case that Zero-shot Model Diagnosis (ZOOM) is possible without the need for a test set nor labeling.
arXiv Detail & Related papers (2023-03-27T17:59:33Z) - Active Sequential Two-Sample Testing [18.99517340397671]
We consider the two-sample testing problem in a new scenario where sample measurements are inexpensive to access.
We devise the first emphactiveNIST-sample testing framework that not only sequentially but also emphactively queries.
In practice, we introduce an instantiation of our framework and evaluate it using several experiments.
arXiv Detail & Related papers (2023-01-30T02:23:49Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Visualizing Classifier Adjacency Relations: A Case Study in Speaker
Verification and Voice Anti-Spoofing [72.4445825335561]
We propose a simple method to derive 2D representation from detection scores produced by an arbitrary set of binary classifiers.
Based upon rank correlations, our method facilitates a visual comparison of classifiers with arbitrary scores.
While the approach is fully versatile and can be applied to any detection task, we demonstrate the method using scores produced by automatic speaker verification and voice anti-spoofing systems.
arXiv Detail & Related papers (2021-06-11T13:03:33Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z) - Evaluating and Mitigating Bias in Image Classifiers: A Causal
Perspective Using Counterfactuals [27.539001365348906]
We present a method for generating counterfactuals by incorporating a structural causal model (SCM) in an improved variant of Adversarially Learned Inference (ALI)
We show how to explain a pre-trained machine learning classifier, evaluate its bias, and mitigate the bias using a counterfactual regularizer.
arXiv Detail & Related papers (2020-09-17T13:19:31Z) - I Am Going MAD: Maximum Discrepancy Competition for Comparing
Classifiers Adaptively [135.7695909882746]
We name the MAximum Discrepancy (MAD) competition.
We adaptively sample a small test set from an arbitrarily large corpus of unlabeled images.
Human labeling on the resulting model-dependent image sets reveals the relative performance of the competing classifiers.
arXiv Detail & Related papers (2020-02-25T03:32:29Z) - Object-based Metamorphic Testing through Image Structuring [0.6445605125467573]
Testing software is often costly due to the need of mass-producing test cases and providing a test oracle for it.
One method that has been proposed in order to alleviate the oracle problem is metamorphic testing.
arXiv Detail & Related papers (2020-02-12T10:32:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.