Outlier Ranking in Large-Scale Public Health Streams
- URL: http://arxiv.org/abs/2401.01459v1
- Date: Tue, 2 Jan 2024 23:08:49 GMT
- Title: Outlier Ranking in Large-Scale Public Health Streams
- Authors: Ananya Joshi, Tina Townes, Nolan Gormley, Luke Neureiter, Roni
Rosenfeld, Bryan Wilder
- Abstract summary: Disease control experts inspect public health data streams daily for outliers worth investigating.
We propose a new task for algorithms to rank the outputs of any univariate method applied to each of many streams.
Our novel algorithm for this task, which leverages hierarchical networks and extreme value analysis, performed the best across traditional outlier detection metrics.
- Score: 17.53470381091954
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Disease control experts inspect public health data streams daily for outliers
worth investigating, like those corresponding to data quality issues or disease
outbreaks. However, they can only examine a few of the thousands of
maximally-tied outliers returned by univariate outlier detection methods
applied to large-scale public health data streams. To help experts distinguish
the most important outliers from these thousands of tied outliers, we propose a
new task for algorithms to rank the outputs of any univariate method applied to
each of many streams. Our novel algorithm for this task, which leverages
hierarchical networks and extreme value analysis, performed the best across
traditional outlier detection metrics in a human-expert evaluation using public
health data streams. Most importantly, experts have used our open-source Python
implementation since April 2023 and report identifying outliers worth
investigating 9.1x faster than their prior baseline. Other organizations can
readily adapt this implementation to create rankings from the outputs of their
tailored univariate methods across large-scale streams.
Related papers
- A method for outlier detection based on cluster analysis and visual expert criteria [0.0]
Outliers are the outcome of fraudulent behaviour, mechanical faults, human error, or simply natural deviations.<n>We propose an outlier detection method based on a clustering process.
arXiv Detail & Related papers (2025-10-27T09:16:16Z) - Fuzzy Granule Density-Based Outlier Detection with Multi-Scale Granular Balls [65.44462297594308]
Outlier detection refers to the identification of anomalous samples that deviate significantly from the distribution of normal data.
Most unsupervised outlier detection methods are carefully designed to detect specified outliers.
We propose a fuzzy rough sets-based multi-scale outlier detection method to identify various types of outliers.
arXiv Detail & Related papers (2025-01-06T12:35:51Z) - A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods.
The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics.
We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z) - Too Good To Be True: performance overestimation in (re)current practices
for Human Activity Recognition [49.1574468325115]
sliding windows for data segmentation followed by standard random k-fold cross validation produce biased results.
It is important to raise awareness in the scientific community about this problem, whose negative effects are being overlooked.
Several experiments with different types of datasets and different types of classification models allow us to exhibit the problem and show it persists independently of the method or dataset.
arXiv Detail & Related papers (2023-10-18T13:24:05Z) - Computationally Assisted Quality Control for Public Health Data Streams [21.056027241048152]
FlaSH is a practical outlier detection framework for public health data users.
It uses simple, scalable models to capture statistical properties of public health streams.
It has been deployed on data streams used by public health stakeholders.
arXiv Detail & Related papers (2023-06-29T13:08:12Z) - On the Universal Adversarial Perturbations for Efficient Data-free
Adversarial Detection [55.73320979733527]
We propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs.
Experimental results show that our method achieves competitive detection performance on various text classification tasks.
arXiv Detail & Related papers (2023-06-27T02:54:07Z) - ODIM: Outlier Detection via Likelihood of Under-Fitted Generative Models [4.956259629094216]
unsupervised outlier detection (UOD) problem refers to a task to identify inliers given training data which contain outliers as well as inliers.
We develop a new method called the outlier detection via the IM effect (ODIM)
Remarkably, the ODIM requires only a few updates, making it computationally efficient at least tens of times faster than other deep-learning-based algorithms.
arXiv Detail & Related papers (2023-01-11T01:02:27Z) - Enhanced Nearest Neighbor Classification for Crowdsourcing [26.19048869302787]
Crowdsourcing is an economical way to label a large amount of data.
The noise in the produced labels may deteriorate the accuracy of any classification method applied to the labelled data.
We propose an enhanced nearest neighbor classifier (ENN) to overcome this issue.
arXiv Detail & Related papers (2022-02-26T22:53:52Z) - Learning to Rank Anomalies: Scalar Performance Criteria and Maximization
of Two-Sample Rank Statistics [0.0]
We propose a data-driven scoring function defined on the feature space which reflects the degree of abnormality of the observations.
This scoring function is learnt through a well-designed binary classification problem.
We illustrate our methodology with preliminary encouraging numerical experiments.
arXiv Detail & Related papers (2021-09-20T14:45:56Z) - Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare.
In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples.
To tackle this problem, we build a robust one-class classification framework via data refinement.
We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z) - TadGAN: Time Series Anomaly Detection Using Generative Adversarial
Networks [73.01104041298031]
TadGAN is an unsupervised anomaly detection approach built on Generative Adversarial Networks (GANs)
To capture the temporal correlations of time series, we use LSTM Recurrent Neural Networks as base models for Generators and Critics.
To demonstrate the performance and generalizability of our approach, we test several anomaly scoring techniques and report the best-suited one.
arXiv Detail & Related papers (2020-09-16T15:52:04Z) - Overcoming the curse of dimensionality with Laplacian regularization in
semi-supervised learning [80.20302993614594]
We provide a statistical analysis to overcome drawbacks of Laplacian regularization.
We unveil a large body of spectral filtering methods that exhibit desirable behaviors.
We provide realistic computational guidelines in order to make our method usable with large amounts of data.
arXiv Detail & Related papers (2020-09-09T14:28:54Z) - History-based Anomaly Detector: an Adversarial Approach to Anomaly
Detection [3.908842679355254]
Anomaly detection is a difficult problem in many areas and has recently been subject to a lot of attention.
We propose a simple yet new adversarial method to tackle this problem, denoted as History-based anomaly detector (HistoryAD)
It consists of a self-supervised model, trained to recognize 'normal' samples by comparing them to samples based on the training history of a previously trained GAN.
arXiv Detail & Related papers (2019-12-26T11:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.