Testing Framework for Black-box AI Models
- URL: http://arxiv.org/abs/2102.06166v1
- Date: Thu, 11 Feb 2021 18:15:23 GMT
- Title: Testing Framework for Black-box AI Models
- Authors: Aniya Aggarwal, Samiulla Shaikh, Sandeep Hans, Swastik Haldar, Rema
Ananthanarayanan, Diptikalyan Saha
- Abstract summary: In this paper, we present an end-to-end generic framework for testing AI Models.
Our tool has been used for testing industrial AI models and was very effective to uncover issues.
- Score: 1.916485402892365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With widespread adoption of AI models for important decision making, ensuring
reliability of such models remains an important challenge. In this paper, we
present an end-to-end generic framework for testing AI Models which performs
automated test generation for different modalities such as text, tabular, and
time-series data and across various properties such as accuracy, fairness, and
robustness. Our tool has been used for testing industrial AI models and was
very effective to uncover issues present in those models. Demo video link:
https://youtu.be/984UCU17YZI
Related papers
- Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model Confidence [3.4049215220521933]
We introduce Dynamic Intelligence Assessment (DIA), a novel methodology for testing AI models.
The framework introduces four new metrics to assess a model's reliability and confidence across multiple attempts.
The accompanying dataset, DIA-Bench, contains a diverse collection of challenge templates with mutable parameters presented in various formats.
arXiv Detail & Related papers (2024-10-20T20:07:36Z) - XAI-based Feature Ensemble for Enhanced Anomaly Detection in Autonomous Driving Systems [1.3022753212679383]
This paper proposes a novel feature ensemble framework that integrates multiple Explainable AI (XAI) methods.
By fusing top features identified by these XAI methods across six diverse AI models, the framework creates a robust and comprehensive set of features critical for detecting anomalies.
Our technique demonstrates improved accuracy, robustness, and transparency of AI models, contributing to safer and more trustworthy autonomous driving systems.
arXiv Detail & Related papers (2024-10-20T14:34:48Z) - LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content [62.816876067499415]
We propose LiveXiv: a scalable evolving live benchmark based on scientific ArXiv papers.
LiveXiv accesses domain-specific manuscripts at any given timestamp and proposes to automatically generate visual question-answer pairs.
We benchmark multiple open and proprietary Large Multi-modal Models (LMMs) on the first version of our benchmark, showing its challenging nature and exposing the models true abilities.
arXiv Detail & Related papers (2024-10-14T17:51:23Z) - AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios.
We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z) - Enhancing the Fairness and Performance of Edge Cameras with Explainable
AI [3.4719449211802456]
Our research presents a diagnostic method using Explainable AI (XAI) for model debug.
We found the training dataset as the main bias source and suggested model augmentation as a solution.
arXiv Detail & Related papers (2024-01-18T10:08:24Z) - Data Synthesis for Testing Black-Box Machine Learning Models [2.3800397174740984]
The increasing usage of machine learning models raises the question of the reliability of these models.
In this paper, we provide a framework for automated test data synthesis to test black-box ML/DL models.
arXiv Detail & Related papers (2021-11-03T12:00:30Z) - Automated Testing of AI Models [3.0616624345970975]
We extend the capability of the AITEST tool to include the testing techniques for Image and Speech-to-text models.
These novel extensions make AITEST a comprehensive framework for testing AI models.
arXiv Detail & Related papers (2021-10-07T10:30:18Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - DirectDebug: Automated Testing and Debugging of Feature Models [55.41644538483948]
Variability models (e.g., feature models) are a common way for the representation of variabilities and commonalities of software artifacts.
Complex and often large-scale feature models can become faulty, i.e., do not represent the expected variability properties of the underlying software artifact.
arXiv Detail & Related papers (2021-02-11T11:22:20Z) - Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring
Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics.
We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.