StaDRe and StaDRo: Reliability and Robustness Estimation of ML-based
Forecasting using Statistical Distance Measures
- URL: http://arxiv.org/abs/2206.11116v1
- Date: Fri, 17 Jun 2022 19:52:48 GMT
- Title: StaDRe and StaDRo: Reliability and Robustness Estimation of ML-based
Forecasting using Statistical Distance Measures
- Authors: Mohammed Naveed Akram, Akshatha Ambekar, Ioannis Sorokos, Koorosh
Aslansefat, Daniel Schneider
- Abstract summary: This work focuses on the use of SafeML for time series data, and on reliability and robustness estimation of ML-forecasting methods using statistical distance measures.
We propose SDD-based Reliability Estimate (StaDRe) and SDD-based Robustness (StaDRo) measures.
With the help of a clustering technique, the similarity between the statistical properties of data seen during training and the forecasts is identified.
- Score: 0.476203519165013
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reliability estimation of Machine Learning (ML) models is becoming a crucial
subject. This is particularly the case when such \mbox{models} are deployed in
safety-critical applications, as the decisions based on model predictions can
result in hazardous situations. In this regard, recent research has proposed
methods to achieve safe, \mbox{dependable}, and reliable ML systems. One such
method consists of detecting and analyzing distributional shift, and then
measuring how such systems respond to these shifts. This was proposed in
earlier work in SafeML. This work focuses on the use of SafeML for time series
data, and on reliability and robustness estimation of ML-forecasting methods
using statistical distance measures. To this end, distance measures based on
the Empirical Cumulative Distribution Function (ECDF) proposed in SafeML are
explored to measure Statistical-Distance Dissimilarity (SDD) across time
series. We then propose SDD-based Reliability Estimate (StaDRe) and SDD-based
Robustness (StaDRo) measures. With the help of a clustering technique, the
similarity between the statistical properties of data seen during training and
the forecasts is identified. The proposed method is capable of providing a link
between dataset SDD and Key Performance Indicators (KPIs) of the ML models.
Related papers
- Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method [108.56493934296687]
We introduce a divergence-based calibration method, inspired by the divergence-from-randomness concept, to calibrate token probabilities for pretraining data detection.
We have developed a Chinese-language benchmark, PatentMIA, to assess the performance of detection approaches for LLMs on Chinese text.
arXiv Detail & Related papers (2024-09-23T07:55:35Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Evaluation of Predictive Reliability to Foster Trust in Artificial
Intelligence. A case study in Multiple Sclerosis [0.34473740271026115]
Spotting Machine Learning failures is of paramount importance when ML predictions are used to drive clinical decisions.
We propose a simple approach that can be used in the deployment phase of any ML model to suggest whether to trust predictions or not.
Our method holds the promise to provide effective support to clinicians by spotting potential ML failures during deployment.
arXiv Detail & Related papers (2024-02-27T14:48:07Z) - Scope Compliance Uncertainty Estimate [0.4262974002462632]
SafeML is a model-agnostic approach for performing such monitoring.
This work addresses these limitations by changing the binary decision to a continuous metric.
arXiv Detail & Related papers (2023-12-17T19:44:20Z) - Dynamic Model Agnostic Reliability Evaluation of Machine-Learning
Methods Integrated in Instrumentation & Control Systems [1.8978726202765634]
Trustworthiness of datadriven neural network-based machine learning algorithms is not adequately assessed.
In recent reports by the National Institute for Standards and Technology, trustworthiness in ML is a critical barrier to adoption.
We demonstrate a real-time model-agnostic method to evaluate the relative reliability of ML predictions by incorporating out-of-distribution detection on the training dataset.
arXiv Detail & Related papers (2023-08-08T18:25:42Z) - Learning Robust Statistics for Simulation-based Inference under Model
Misspecification [23.331522354991527]
We propose the first general approach to handle model misspecification that works across different classes of simulation-based inference methods.
We show that our method yields robust inference in misspecified scenarios, whilst still being accurate when the model is well-specified.
arXiv Detail & Related papers (2023-05-25T09:06:26Z) - Two-Stage Robust and Sparse Distributed Statistical Inference for
Large-Scale Data [18.34490939288318]
We address the problem of conducting statistical inference in settings involving large-scale data that may be high-dimensional and contaminated by outliers.
We propose a two-stage distributed and robust statistical inference procedures coping with high-dimensional models by promoting sparsity.
arXiv Detail & Related papers (2022-08-17T11:17:47Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - SAMBA: Safe Model-Based & Active Reinforcement Learning [59.01424351231993]
SAMBA is a framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics.
We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations.
We provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
arXiv Detail & Related papers (2020-06-12T10:40:46Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.