Evaluation of Predictive Reliability to Foster Trust in Artificial
Intelligence. A case study in Multiple Sclerosis
- URL: http://arxiv.org/abs/2402.17554v1
- Date: Tue, 27 Feb 2024 14:48:07 GMT
- Title: Evaluation of Predictive Reliability to Foster Trust in Artificial
Intelligence. A case study in Multiple Sclerosis
- Authors: Lorenzo Peracchio, Giovanna Nicora, Enea Parimbelli, Tommaso Mario
Buonocore, Roberto Bergamaschi, Eleonora Tavazzi, Arianna Dagliati, Riccardo
Bellazzi
- Abstract summary: Spotting Machine Learning failures is of paramount importance when ML predictions are used to drive clinical decisions.
We propose a simple approach that can be used in the deployment phase of any ML model to suggest whether to trust predictions or not.
Our method holds the promise to provide effective support to clinicians by spotting potential ML failures during deployment.
- Score: 0.34473740271026115
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Applying Artificial Intelligence (AI) and Machine Learning (ML) in critical
contexts, such as medicine, requires the implementation of safety measures to
reduce risks of harm in case of prediction errors. Spotting ML failures is of
paramount importance when ML predictions are used to drive clinical decisions.
ML predictive reliability measures the degree of trust of a ML prediction on a
new instance, thus allowing decision-makers to accept or reject it based on its
reliability. To assess reliability, we propose a method that implements two
principles. First, our approach evaluates whether an instance to be classified
is coming from the same distribution of the training set. To do this, we
leverage Autoencoders (AEs) ability to reconstruct the training set with low
error. An instance is considered Out-of-Distribution (OOD) if the AE
reconstructs it with a high error. Second, it is evaluated whether the ML
classifier has good performances on samples similar to the newly classified
instance by using a proxy model. We show that this approach is able to assess
reliability both in a simulated scenario and on a model trained to predict
disease progression of Multiple Sclerosis patients. We also developed a Python
package, named relAI, to embed reliability measures into ML pipelines. We
propose a simple approach that can be used in the deployment phase of any ML
model to suggest whether to trust predictions or not. Our method holds the
promise to provide effective support to clinicians by spotting potential ML
failures during deployment.
Related papers
- Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models [79.76293901420146]
Large Language Models (LLMs) are employed across various high-stakes domains, where the reliability of their outputs is crucial.
Our research investigates the fragility of uncertainty estimation and explores potential attacks.
We demonstrate that an attacker can embed a backdoor in LLMs, which, when activated by a specific trigger in the input, manipulates the model's uncertainty without affecting the final output.
arXiv Detail & Related papers (2024-07-15T23:41:11Z) - Automated Trustworthiness Testing for Machine Learning Classifiers [3.3423762257383207]
This paper proposes TOWER, the first technique to automatically create trustworthiness oracles that determine whether text classifier predictions are trustworthy.
Our hypothesis is that a prediction is trustworthy if the words in its explanation are semantically related to the predicted class.
The results show that TOWER can detect a decrease in trustworthiness as noise increases, but is not effective when evaluated against the human-labeled dataset.
arXiv Detail & Related papers (2024-06-07T20:25:05Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Dynamic Model Agnostic Reliability Evaluation of Machine-Learning
Methods Integrated in Instrumentation & Control Systems [1.8978726202765634]
Trustworthiness of datadriven neural network-based machine learning algorithms is not adequately assessed.
In recent reports by the National Institute for Standards and Technology, trustworthiness in ML is a critical barrier to adoption.
We demonstrate a real-time model-agnostic method to evaluate the relative reliability of ML predictions by incorporating out-of-distribution detection on the training dataset.
arXiv Detail & Related papers (2023-08-08T18:25:42Z) - Reliability-Aware Prediction via Uncertainty Learning for Person Image
Retrieval [51.83967175585896]
UAL aims at providing reliability-aware predictions by considering data uncertainty and model uncertainty simultaneously.
Data uncertainty captures the noise" inherent in the sample, while model uncertainty depicts the model's confidence in the sample's prediction.
arXiv Detail & Related papers (2022-10-24T17:53:20Z) - Distillation to Enhance the Portability of Risk Models Across
Institutions with Large Patient Claims Database [12.452703677540505]
We investigate the practicality of model portability through a cross-site evaluation of readmission prediction models.
We apply a recurrent neural network, augmented with self-attention and blended with expert features, to build readmission prediction models.
Our experiments show that direct application of ML models trained at one institution and tested at another institution perform worse than models trained and tested at the same institution.
arXiv Detail & Related papers (2022-07-06T05:26:32Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - When Does Uncertainty Matter?: Understanding the Impact of Predictive
Uncertainty in ML Assisted Decision Making [68.19284302320146]
We carry out user studies to assess how people with differing levels of expertise respond to different types of predictive uncertainty.
We found that showing posterior predictive distributions led to smaller disagreements with the ML model's predictions.
This suggests that posterior predictive distributions can potentially serve as useful decision aids which should be used with caution and take into account the type of distribution and the expertise of the human.
arXiv Detail & Related papers (2020-11-12T02:23:53Z) - Prediction Confidence from Neighbors [0.0]
The inability of Machine Learning (ML) models to successfully extrapolate correct predictions from out-of-distribution (OoD) samples is a major hindrance to the application of ML in critical applications.
We show that feature space distance is a meaningful measure that can provide confidence in predictions.
This enables earlier and safer deployment of models in critical applications and is vital for deploying models under ever-changing conditions.
arXiv Detail & Related papers (2020-03-31T09:26:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.