A Holistic Assessment of the Reliability of Machine Learning Systems
- URL: http://arxiv.org/abs/2307.10586v2
- Date: Sat, 29 Jul 2023 22:55:10 GMT
- Title: A Holistic Assessment of the Reliability of Machine Learning Systems
- Authors: Anthony Corso, David Karamadian, Romeo Valentin, Mary Cooper, Mykel J.
Kochenderfer
- Abstract summary: This paper proposes a holistic assessment methodology for the reliability of machine learning (ML) systems.
Our framework evaluates five key properties: in-distribution accuracy, distribution-shift robustness, adversarial robustness, calibration, and out-of-distribution detection.
To provide insights into the performance of different algorithmic approaches, we identify and categorize state-of-the-art techniques.
- Score: 30.638615396429536
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As machine learning (ML) systems increasingly permeate high-stakes settings
such as healthcare, transportation, military, and national security, concerns
regarding their reliability have emerged. Despite notable progress, the
performance of these systems can significantly diminish due to adversarial
attacks or environmental changes, leading to overconfident predictions,
failures to detect input faults, and an inability to generalize in unexpected
scenarios. This paper proposes a holistic assessment methodology for the
reliability of ML systems. Our framework evaluates five key properties:
in-distribution accuracy, distribution-shift robustness, adversarial
robustness, calibration, and out-of-distribution detection. A reliability score
is also introduced and used to assess the overall system reliability. To
provide insights into the performance of different algorithmic approaches, we
identify and categorize state-of-the-art techniques, then evaluate a selection
on real-world tasks using our proposed reliability metrics and reliability
score. Our analysis of over 500 models reveals that designing for one metric
does not necessarily constrain others but certain algorithmic techniques can
improve reliability across multiple metrics simultaneously. This study
contributes to a more comprehensive understanding of ML reliability and
provides a roadmap for future research and development.
Related papers
- Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework [54.40508478482667]
We present a comprehensive framework to disentangle, quantify, and mitigate uncertainty in perception and plan generation.
We propose methods tailored to the unique properties of perception and decision-making.
We show that our uncertainty disentanglement framework reduces variability by up to 40% and enhances task success rates by 5% compared to baselines.
arXiv Detail & Related papers (2024-11-03T17:32:00Z) - VERA: Validation and Evaluation of Retrieval-Augmented Systems [5.709401805125129]
VERA is a framework designed to enhance the transparency and reliability of outputs from large language models (LLMs)
We show how VERA can strengthen decision-making processes and trust in AI applications.
arXiv Detail & Related papers (2024-08-16T21:59:59Z) - Semi-Supervised Multi-Task Learning Based Framework for Power System Security Assessment [0.0]
This paper develops a novel machine learning-based framework using Semi-Supervised Multi-Task Learning (SS-MTL) for power system dynamic security assessment.
The learning algorithm underlying the proposed framework integrates conditional masked encoders and employs multi-task learning for classification-aware feature representation.
Various experiments on the IEEE 68-bus system were conducted to validate the proposed method.
arXiv Detail & Related papers (2024-07-11T22:42:53Z) - A Domain-Agnostic Approach for Characterization of Lifelong Learning
Systems [128.63953314853327]
"Lifelong Learning" systems are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability.
We show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems.
arXiv Detail & Related papers (2023-01-18T21:58:54Z) - Trusted Multi-View Classification with Dynamic Evidential Fusion [73.35990456162745]
We propose a novel multi-view classification algorithm, termed trusted multi-view classification (TMC)
TMC provides a new paradigm for multi-view learning by dynamically integrating different views at an evidence level.
Both theoretical and experimental results validate the effectiveness of the proposed model in accuracy, robustness and trustworthiness.
arXiv Detail & Related papers (2022-04-25T03:48:49Z) - Statistical Perspectives on Reliability of Artificial Intelligence
Systems [6.284088451820049]
We provide statistical perspectives on the reliability of AI systems.
We introduce a so-called SMART statistical framework for AI reliability research.
We discuss recent developments in modeling and analysis of AI reliability.
arXiv Detail & Related papers (2021-11-09T20:00:14Z) - Physics-Informed Deep Learning: A Promising Technique for System
Reliability Assessment [1.847740135967371]
There is limited study on the utilization of deep learning for system reliability assessment.
We present an approach to frame system reliability assessment in the context of physics-informed deep learning.
The proposed approach is demonstrated by three numerical examples involving a dual-processor computing system.
arXiv Detail & Related papers (2021-08-24T16:24:46Z) - Multi Agent System for Machine Learning Under Uncertainty in Cyber
Physical Manufacturing System [78.60415450507706]
Recent advancements in predictive machine learning has led to its application in various use cases in manufacturing.
Most research focused on maximising predictive accuracy without addressing the uncertainty associated with it.
In this paper, we determine the sources of uncertainty in machine learning and establish the success criteria of a machine learning system to function well under uncertainty.
arXiv Detail & Related papers (2021-07-28T10:28:05Z) - Uncertainty-Aware Boosted Ensembling in Multi-Modal Settings [33.25969141014772]
Uncertainty estimation is a widely researched method to highlight the confidence of machine learning systems in deployment.
Sequential and parallel ensemble techniques have shown improved performance of ML systems in multi-modal settings.
We propose an uncertainty-aware boosting technique for multi-modal ensembling in order to focus on the data points with higher associated uncertainty estimates.
arXiv Detail & Related papers (2021-04-21T18:28:13Z) - Trusted Multi-View Classification [76.73585034192894]
We propose a novel multi-view classification method, termed trusted multi-view classification.
It provides a new paradigm for multi-view learning by dynamically integrating different views at an evidence level.
The proposed algorithm jointly utilizes multiple views to promote both classification reliability and robustness.
arXiv Detail & Related papers (2021-02-03T13:30:26Z) - Trustworthy AI [75.99046162669997]
Brittleness to minor adversarial changes in the input data, ability to explain the decisions, address the bias in their training data, are some of the most prominent limitations.
We propose the tutorial on Trustworthy AI to address six critical issues in enhancing user and public trust in AI systems.
arXiv Detail & Related papers (2020-11-02T20:04:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.