Related papers: A Holistic Assessment of the Reliability of Machine Learning Systems

A Holistic Assessment of the Reliability of Machine Learning Systems

URL: http://arxiv.org/abs/2307.10586v2
Date: Sat, 29 Jul 2023 22:55:10 GMT
Title: A Holistic Assessment of the Reliability of Machine Learning Systems
Authors: Anthony Corso, David Karamadian, Romeo Valentin, Mary Cooper, Mykel J. Kochenderfer
Abstract summary: This paper proposes a holistic assessment methodology for the reliability of machine learning (ML) systems. Our framework evaluates five key properties: in-distribution accuracy, distribution-shift robustness, adversarial robustness, calibration, and out-of-distribution detection. To provide insights into the performance of different algorithmic approaches, we identify and categorize state-of-the-art techniques.
Score: 30.638615396429536
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As machine learning (ML) systems increasingly permeate high-stakes settings such as healthcare, transportation, military, and national security, concerns regarding their reliability have emerged. Despite notable progress, the performance of these systems can significantly diminish due to adversarial attacks or environmental changes, leading to overconfident predictions, failures to detect input faults, and an inability to generalize in unexpected scenarios. This paper proposes a holistic assessment methodology for the reliability of ML systems. Our framework evaluates five key properties: in-distribution accuracy, distribution-shift robustness, adversarial robustness, calibration, and out-of-distribution detection. A reliability score is also introduced and used to assess the overall system reliability. To provide insights into the performance of different algorithmic approaches, we identify and categorize state-of-the-art techniques, then evaluate a selection on real-world tasks using our proposed reliability metrics and reliability score. Our analysis of over 500 models reveals that designing for one metric does not necessarily constrain others but certain algorithmic techniques can improve reliability across multiple metrics simultaneously. This study contributes to a more comprehensive understanding of ML reliability and provides a roadmap for future research and development.

Related papers

MCQA-Eval: Efficient Confidence Evaluation in NLG with Gold-Standard Correctness Labels [16.300463494913593]
Large Language Models (LLMs) require robust confidence estimation. McQCA-Eval is an evaluation framework for assessing confidence measures in Natural Language Generation.
arXiv Detail & Related papers (2025-02-20T05:09:29Z)
Probabilistic Modeling of Disparity Uncertainty for Robust and Efficient Stereo Matching [61.73532883992135]
We propose a new uncertainty-aware stereo matching framework. We adopt Bayes risk as the measurement of uncertainty and use it to separately estimate data and model uncertainty.
arXiv Detail & Related papers (2024-12-24T23:28:20Z)
Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework [54.40508478482667]
We present a comprehensive framework to disentangle, quantify, and mitigate uncertainty in perception and plan generation. We propose methods tailored to the unique properties of perception and decision-making. We show that our uncertainty disentanglement framework reduces variability by up to 40% and enhances task success rates by 5% compared to baselines.
arXiv Detail & Related papers (2024-11-03T17:32:00Z)
VERA: Validation and Evaluation of Retrieval-Augmented Systems [5.709401805125129]
VERA is a framework designed to enhance the transparency and reliability of outputs from large language models (LLMs) We show how VERA can strengthen decision-making processes and trust in AI applications.
arXiv Detail & Related papers (2024-08-16T21:59:59Z)
Semi-Supervised Multi-Task Learning Based Framework for Power System Security Assessment [0.0]
This paper develops a novel machine learning-based framework using Semi-Supervised Multi-Task Learning (SS-MTL) for power system dynamic security assessment. The learning algorithm underlying the proposed framework integrates conditional masked encoders and employs multi-task learning for classification-aware feature representation. Various experiments on the IEEE 68-bus system were conducted to validate the proposed method.
arXiv Detail & Related papers (2024-07-11T22:42:53Z)
A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems [128.63953314853327]
"Lifelong Learning" systems are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability. We show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems.
arXiv Detail & Related papers (2023-01-18T21:58:54Z)
Trusted Multi-View Classification with Dynamic Evidential Fusion [73.35990456162745]
We propose a novel multi-view classification algorithm, termed trusted multi-view classification (TMC) TMC provides a new paradigm for multi-view learning by dynamically integrating different views at an evidence level. Both theoretical and experimental results validate the effectiveness of the proposed model in accuracy, robustness and trustworthiness.
arXiv Detail & Related papers (2022-04-25T03:48:49Z)
Statistical Perspectives on Reliability of Artificial Intelligence Systems [6.284088451820049]
We provide statistical perspectives on the reliability of AI systems. We introduce a so-called SMART statistical framework for AI reliability research. We discuss recent developments in modeling and analysis of AI reliability.
arXiv Detail & Related papers (2021-11-09T20:00:14Z)
Physics-Informed Deep Learning: A Promising Technique for System Reliability Assessment [1.847740135967371]
There is limited study on the utilization of deep learning for system reliability assessment. We present an approach to frame system reliability assessment in the context of physics-informed deep learning. The proposed approach is demonstrated by three numerical examples involving a dual-processor computing system.
arXiv Detail & Related papers (2021-08-24T16:24:46Z)
Multi Agent System for Machine Learning Under Uncertainty in Cyber Physical Manufacturing System [78.60415450507706]
Recent advancements in predictive machine learning has led to its application in various use cases in manufacturing. Most research focused on maximising predictive accuracy without addressing the uncertainty associated with it. In this paper, we determine the sources of uncertainty in machine learning and establish the success criteria of a machine learning system to function well under uncertainty.
arXiv Detail & Related papers (2021-07-28T10:28:05Z)
Uncertainty-Aware Boosted Ensembling in Multi-Modal Settings [33.25969141014772]
Uncertainty estimation is a widely researched method to highlight the confidence of machine learning systems in deployment. Sequential and parallel ensemble techniques have shown improved performance of ML systems in multi-modal settings. We propose an uncertainty-aware boosting technique for multi-modal ensembling in order to focus on the data points with higher associated uncertainty estimates.
arXiv Detail & Related papers (2021-04-21T18:28:13Z)
Trusted Multi-View Classification [76.73585034192894]
We propose a novel multi-view classification method, termed trusted multi-view classification. It provides a new paradigm for multi-view learning by dynamically integrating different views at an evidence level. The proposed algorithm jointly utilizes multiple views to promote both classification reliability and robustness.
arXiv Detail & Related papers (2021-02-03T13:30:26Z)
Trustworthy AI [75.99046162669997]
Brittleness to minor adversarial changes in the input data, ability to explain the decisions, address the bias in their training data, are some of the most prominent limitations. We propose the tutorial on Trustworthy AI to address six critical issues in enhancing user and public trust in AI systems.
arXiv Detail & Related papers (2020-11-02T20:04:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.