Related papers: Importance-Driven Deep Learning System Testing

Importance-Driven Deep Learning System Testing

URL: http://arxiv.org/abs/2002.03433v1
Date: Sun, 9 Feb 2020 19:20:56 GMT
Title: Importance-Driven Deep Learning System Testing
Authors: Simos Gerasimou, Hasan Ferit Eniser, Alper Sen, Alper Cakan
Abstract summary: Deep Learning (DL) systems are key enablers for engineering intelligent applications. Using DL systems in safety- and security-critical applications requires to provide testing evidence for their dependable operation. DeepImportance is a systematic testing methodology accompanied by an Importance-Driven (IDC) test adequacy criterion.
Score: 12.483260526189449
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep Learning (DL) systems are key enablers for engineering intelligent applications due to their ability to solve complex tasks such as image recognition and machine translation. Nevertheless, using DL systems in safety- and security-critical applications requires to provide testing evidence for their dependable operation. Recent research in this direction focuses on adapting testing criteria from traditional software engineering as a means of increasing confidence for their correct behaviour. However, they are inadequate in capturing the intrinsic properties exhibited by these systems. We bridge this gap by introducing DeepImportance, a systematic testing methodology accompanied by an Importance-Driven (IDC) test adequacy criterion for DL systems. Applying IDC enables to establish a layer-wise functional understanding of the importance of DL system components and use this information to assess the semantic diversity of a test set. Our empirical evaluation on several DL systems, across multiple DL datasets and with state-of-the-art adversarial generation techniques demonstrates the usefulness and effectiveness of DeepImportance and its ability to support the engineering of more robust DL systems.

Related papers

Analyzing Adversarial Inputs in Deep Reinforcement Learning [53.3760591018817]
We present a comprehensive analysis of the characterization of adversarial inputs, through the lens of formal verification. We introduce a novel metric, the Adversarial Rate, to classify models based on their susceptibility to such perturbations. Our analysis empirically demonstrates how adversarial inputs can affect the safety of a given DRL system with respect to such perturbations.
arXiv Detail & Related papers (2024-02-07T21:58:40Z)
Effective Intrusion Detection in Heterogeneous Internet-of-Things Networks via Ensemble Knowledge Distillation-based Federated Learning [52.6706505729803]
We introduce Federated Learning (FL) to collaboratively train a decentralized shared model of Intrusion Detection Systems (IDS) FLEKD enables a more flexible aggregation method than conventional model fusion techniques. Experiment results show that the proposed approach outperforms local training and traditional FL in terms of both speed and performance.
arXiv Detail & Related papers (2024-01-22T14:16:37Z)
DIALIGHT: Lightweight Multilingual Development and Evaluation of Task-Oriented Dialogue Systems with Large Language Models [76.79929883963275]
DIALIGHT is a toolkit for developing and evaluating multilingual Task-Oriented Dialogue (ToD) systems. It features a secure, user-friendly web interface for fine-grained human evaluation at both local utterance level and global dialogue level. Our evaluations reveal that while PLM fine-tuning leads to higher accuracy and coherence, LLM-based systems excel in producing diverse and likeable responses.
arXiv Detail & Related papers (2024-01-04T11:27:48Z)
Testing learning-enabled cyber-physical systems with Large-Language Models: A Formal Approach [32.15663640443728]
The integration of machine learning (ML) into cyber-physical systems (CPS) offers significant benefits. Existing verification and validation techniques are often inadequate for these new paradigms. We propose a roadmap to transition from foundational probabilistic testing to a more rigorous approach capable of delivering formal assurance.
arXiv Detail & Related papers (2023-11-13T14:56:14Z)
A Systematic Study of Performance Disparities in Multilingual Task-Oriented Dialogue Systems [68.76102493999134]
We take stock of and empirically analyse task performance disparities that exist between multilingual task-oriented dialogue systems. We prove the existence of the adaptation and intrinsic biases in current ToD systems. Our analyses offer practical tips on how to approach ToD data collection and system development for new languages.
arXiv Detail & Related papers (2023-10-19T16:41:44Z)
Enabling Resource-efficient AIoT System with Cross-level Optimization: A survey [20.360136850102833]
This survey aims to provide a broader optimization space for more free resource-performance tradeoffs. By consolidating problems and techniques scattered over diverse levels, we aim to help readers understand their connections and stimulate further discussions.
arXiv Detail & Related papers (2023-09-27T08:04:24Z)
Robustness and Generalization Performance of Deep Learning Models on Cyber-Physical Systems: A Comparative Study [71.84852429039881]
Investigation focuses on the models' ability to handle a range of perturbations, such as sensor faults and noise. We test the generalization and transfer learning capabilities of these models by exposing them to out-of-distribution (OOD) samples.
arXiv Detail & Related papers (2023-06-13T12:43:59Z)
Truthful Meta-Explanations for Local Interpretability of Machine Learning Models [10.342433824178825]
We present a local meta-explanation technique which builds on top of the truthfulness metric, which is a faithfulness-based metric. We demonstrate the effectiveness of both the technique and the metric by concretely defining all the concepts and through experimentation.
arXiv Detail & Related papers (2022-12-07T08:32:04Z)
Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance. We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems. We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z)
A Comparative Study of AI-based Intrusion Detection Techniques in Critical Infrastructures [4.8041243535151645]
We present a comparative study of Artificial Intelligence (AI)-driven intrusion detection systems for wirelessly connected sensors that track crucial applications. Specifically, we present an in-depth analysis of the use of machine learning, deep learning and reinforcement learning solutions to recognize intrusive behavior in the collected traffic. Results present the performance metrics for three different IDSs namely the Adaptively Supervised and Clustered Hybrid IDS, Boltzmann Machine-based Clustered IDS and Q-learning based IDS.
arXiv Detail & Related papers (2020-07-24T20:55:57Z)
Manifold for Machine Learning Assurance [9.594432031144716]
We propose an analogous approach for machine-learning (ML) systems using an ML technique that extracts from the high-dimensional training data implicitly describing the required system. It is then harnessed for a range of quality assurance tasks such as test adequacy measurement, test input generation, and runtime monitoring of the target ML system. Preliminary experiments establish that the proposed manifold-based approach, for test adequacy drives diversity in test data, for test generation yields fault-revealing yet realistic test cases, and for runtime monitoring provides an independent means to assess trustability of the target system's output.
arXiv Detail & Related papers (2020-02-08T11:39:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.