Related papers: Testing Deep Learning Models: A First Comparative Study of Multiple Testing Techniques

Testing Deep Learning Models: A First Comparative Study of Multiple Testing Techniques

URL: http://arxiv.org/abs/2202.12139v1
Date: Thu, 24 Feb 2022 15:05:19 GMT
Title: Testing Deep Learning Models: A First Comparative Study of Multiple Testing Techniques
Authors: Mohit Kumar Ahuja, Arnaud Gotlieb, Helge Spieker
Abstract summary: Vision-based systems (VBS) are used in autonomous driving, robotic surgery, critical infrastructure surveillance, air and maritime traffic control, etc. Deep Learning (DL) has revolutionized the capabilities of vision-based systems (VBS) in critical applications such as autonomous driving, robotic surgery, critical infrastructure surveillance, air and maritime traffic control, etc.
Score: 15.695048480513536
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep Learning (DL) has revolutionized the capabilities of vision-based systems (VBS) in critical applications such as autonomous driving, robotic surgery, critical infrastructure surveillance, air and maritime traffic control, etc. By analyzing images, voice, videos, or any type of complex signals, DL has considerably increased the situation awareness of these systems. At the same time, while relying more and more on trained DL models, the reliability and robustness of VBS have been challenged and it has become crucial to test thoroughly these models to assess their capabilities and potential errors. To discover faults in DL models, existing software testing methods have been adapted and refined accordingly. In this article, we provide an overview of these software testing methods, namely differential, metamorphic, mutation, and combinatorial testing, as well as adversarial perturbation testing and review some challenges in their deployment for boosting perception systems used in VBS. We also provide a first experimental comparative study on a classical benchmark used in VBS and discuss its results.

Related papers

A New Perspective on Time Series Anomaly Detection: Faster Patch-based Broad Learning System [59.38402187365612]
Time series anomaly detection (TSAD) has been a research hotspot in both academia and industry in recent years. Deep learning is not required for TSAD due to limitations such as slow deep learning speed. We propose Contrastive Patch-based Broad Learning System (CBLS)
arXiv Detail & Related papers (2024-12-07T01:58:18Z)
Underwater Object Detection in the Era of Artificial Intelligence: Current, Challenge, and Future [119.88454942558485]
Underwater object detection (UOD) aims to identify and localise objects in underwater images or videos. In recent years, artificial intelligence (AI) based methods, especially deep learning methods, have shown promising performance in UOD.
arXiv Detail & Related papers (2024-10-08T00:25:33Z)
Towards Testing and Evaluating Vision-Language-Action Models for Robotic Manipulation: An Empirical Study [7.8735930411335895]
Vision-language-action (VLA) models have attracted much attention regarding their potential to advance robotic manipulation. Despite the end-to-end perception-control loop offered by the VLA models, there is a lack of comprehensive understanding of the capabilities of such models. We present VLATest, a testing framework that automatically generates diverse robotic manipulation scenes to assess the performance of VLA models.
arXiv Detail & Related papers (2024-09-19T16:33:00Z)
Complementary Learning for Real-World Model Failure Detection [15.779651238128562]
We introduce complementary learning, where we use learned characteristics from different training paradigms to detect model errors. We demonstrate our approach by learning semantic and predictive motion labels in point clouds in a supervised and self-supervised manner. We perform a large-scale qualitative analysis and present LidarCODA, the first dataset with labeled anomalies in lidar point clouds.
arXiv Detail & Related papers (2024-07-19T13:36:35Z)
Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics [54.08757792080732]
We propose integrating deep features from pre-trained visual models with a statistical analysis model to achieve opinion-unaware BIQA (OU-BIQA) Our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models.
arXiv Detail & Related papers (2024-05-29T06:09:34Z)
MultiTest: Physical-Aware Object Insertion for Testing Multi-sensor Fusion Perception Systems [23.460181958075566]
Multi-sensor fusion (MSF) is a key technique in addressing numerous safety-critical tasks and applications, e.g., self-driving cars and automated robotic arms. Existing testing methods primarily concentrate on single-sensor perception systems. We introduce MultiTest, a fitness-guided metamorphic testing method for complex MSF perception systems.
arXiv Detail & Related papers (2024-01-25T17:03:02Z)
A Reusable AI-Enabled Defect Detection System for Railway Using Ensembled CNN [5.381374943525773]
Defect detection is crucial for ensuring the trustworthiness of railway systems. Current approaches rely on single deep-learning models, like CNNs. We propose a reusable AI-enabled defect detection approach.
arXiv Detail & Related papers (2023-11-24T19:45:55Z)
Diffusion-based Visual Counterfactual Explanations -- Towards Systematic Quantitative Evaluation [64.0476282000118]
Latest methods for visual counterfactual explanations (VCE) harness the power of deep generative models to synthesize new examples of high-dimensional images of impressive quality. It is currently difficult to compare the performance of these VCE methods as the evaluation procedures largely vary and often boil down to visual inspection of individual examples and small scale user studies. We propose a framework for systematic, quantitative evaluation of the VCE methods and a minimal set of metrics to be used.
arXiv Detail & Related papers (2023-08-11T12:22:37Z)
Robustness and Generalization Performance of Deep Learning Models on Cyber-Physical Systems: A Comparative Study [71.84852429039881]
Investigation focuses on the models' ability to handle a range of perturbations, such as sensor faults and noise. We test the generalization and transfer learning capabilities of these models by exposing them to out-of-distribution (OOD) samples.
arXiv Detail & Related papers (2023-06-13T12:43:59Z)
Learning continuous models for continuous physics [94.42705784823997]
We develop a test based on numerical analysis theory to validate machine learning models for science and engineering applications. Our results illustrate how principled numerical analysis methods can be coupled with existing ML training/testing methodologies to validate models for science and engineering applications.
arXiv Detail & Related papers (2022-02-17T07:56:46Z)
Benchmarking Detection Transfer Learning with Vision Transformers [60.97703494764904]
complexity of object detection methods can make benchmarking non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive. We present training techniques that overcome these challenges, enabling the use of standard ViT models as the backbone of Mask R-CNN. Our results show that recent masking-based unsupervised learning methods may, for the first time, provide convincing transfer learning improvements on COCO.
arXiv Detail & Related papers (2021-11-22T18:59:15Z)
Using Neural Architecture Search for Improving Software Flaw Detection in Multimodal Deep Learning Models [2.5705339271809753]
In this work, we demonstrate that even better performance can be achieved using neural architecture search (NAS) combined with multimodal learning models. We adapt a NAS framework aimed at investigating image classification to the problem of software flaw detection and demonstrate improved results on the Juliet Test Suite.
arXiv Detail & Related papers (2020-09-22T15:59:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.