On Predictive Explanation of Data Anomalies
- URL: http://arxiv.org/abs/2110.09467v1
- Date: Mon, 18 Oct 2021 16:59:28 GMT
- Title: On Predictive Explanation of Data Anomalies
- Authors: Nikolaos Myrtakis, Ioannis Tsamardinos, Vassilis Christophides
- Abstract summary: PROTEUS is an AutoML pipeline designed for feature selection on imbalanced datasets.
It produces predictive explanations by approximating the decision surface of an unsupervised detector.
It reliably estimates their predictive performance in unseen data.
- Score: 3.1798318618973362
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Numerous algorithms have been proposed for detecting anomalies (outliers,
novelties) in an unsupervised manner. Unfortunately, it is not trivial, in
general, to understand why a given sample (record) is labelled as an anomaly
and thus diagnose its root causes. We propose the following
reduced-dimensionality, surrogate model approach to explain detector decisions:
approximate the detection model with another one that employs only a small
subset of features. Subsequently, samples can be visualized in this
low-dimensionality space for human understanding. To this end, we develop
PROTEUS, an AutoML pipeline to produce the surrogate model, specifically
designed for feature selection on imbalanced datasets. The PROTEUS surrogate
model can not only explain the training data, but also the out-of-sample
(unseen) data. In other words, PROTEUS produces predictive explanations by
approximating the decision surface of an unsupervised detector. PROTEUS is
designed to return an accurate estimate of out-of-sample predictive performance
to serve as a metric of the quality of the approximation. Computational
experiments confirm the efficacy of PROTEUS to produce predictive explanations
for different families of detectors and to reliably estimate their predictive
performance in unseen data. Unlike several ad-hoc feature importance methods,
PROTEUS is robust to high-dimensional data.
Related papers
- PASA: Attack Agnostic Unsupervised Adversarial Detection using Prediction & Attribution Sensitivity Analysis [2.5347892611213614]
Deep neural networks for classification are vulnerable to adversarial attacks, where small perturbations to input samples lead to incorrect predictions.
We develop a practical method for this characteristic of model prediction and feature attribution to detect adversarial samples.
Our approach demonstrates competitive performance even when an adversary is aware of the defense mechanism.
arXiv Detail & Related papers (2024-04-12T21:22:21Z) - LMD: Light-weight Prediction Quality Estimation for Object Detection in
Lidar Point Clouds [3.927702899922668]
Object detection on Lidar point cloud data is a promising technology for autonomous driving and robotics.
Uncertainty estimation is a crucial component for down-stream tasks and deep neural networks remain error-prone even for predictions with high confidence.
We propose LidarMetaDetect, a light-weight post-processing scheme for prediction quality estimation.
Our experiments show a significant increase of statistical reliability in separating true from false predictions.
arXiv Detail & Related papers (2023-06-13T15:13:29Z) - Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative.
We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z) - Robust self-healing prediction model for high dimensional data [0.685316573653194]
This work proposes a robust self healing (RSH) hybrid prediction model.
It functions by using the data in its entirety by removing errors and inconsistencies from it rather than discarding any data.
The proposed method is compared with some of the existing high performing models and the results are analyzed.
arXiv Detail & Related papers (2022-10-04T17:55:50Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - Out-of-distribution detection for regression tasks: parameter versus
predictor entropy [2.026281591452464]
It is crucial to detect when an instance lies downright too far from the training samples for the machine learning model to be trusted.
For neural networks, one approach to this task consists of learning a diversity of predictors that all can explain the training data.
We propose a new way of estimating the entropy of a distribution on predictors based on nearest neighbors in function space.
arXiv Detail & Related papers (2020-10-24T21:41:21Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - A Causal Direction Test for Heterogeneous Populations [10.653162005300608]
Most causal models assume a single homogeneous population, an assumption that may fail to hold in many applications.
We show that when the homogeneity assumption is violated, causal models developed based on such assumption can fail to identify the correct causal direction.
We propose an adjustment to a commonly used causal direction test statistic by using a $k$-means type clustering algorithm.
arXiv Detail & Related papers (2020-06-08T18:59:14Z) - Unsupervised Anomaly Detection with Adversarial Mirrored AutoEncoders [51.691585766702744]
We propose a variant of Adversarial Autoencoder which uses a mirrored Wasserstein loss in the discriminator to enforce better semantic-level reconstruction.
We put forward an alternative measure of anomaly score to replace the reconstruction-based metric.
Our method outperforms the current state-of-the-art methods for anomaly detection on several OOD detection benchmarks.
arXiv Detail & Related papers (2020-03-24T08:26:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.