Related papers: Do Ensembling and Meta-Learning Improve Outlier Detection in Randomized Controlled Trials?

Do Ensembling and Meta-Learning Improve Outlier Detection in Randomized Controlled Trials?

URL: http://arxiv.org/abs/2311.05473v1
Date: Thu, 9 Nov 2023 16:05:38 GMT
Title: Do Ensembling and Meta-Learning Improve Outlier Detection in Randomized Controlled Trials?
Authors: Walter Nelson, Jonathan Ranisau, Jeremy Petch
Abstract summary: We evaluate 6 modern machine learning-based outlier detection algorithms on the task of identifying irregular data in 838 datasets from 7 real-world MCRCTs. Our results reinforce key findings from prior work in the outlier detection literature on data from other domains. We propose the Meta-learned Probabilistic Ensemble (MePE), a simple algorithm for aggregating the predictions of multiple unsupervised models.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern multi-centre randomized controlled trials (MCRCTs) collect massive amounts of tabular data, and are monitored intensively for irregularities by humans. We began by empirically evaluating 6 modern machine learning-based outlier detection algorithms on the task of identifying irregular data in 838 datasets from 7 real-world MCRCTs with a total of 77,001 patients from over 44 countries. Our results reinforce key findings from prior work in the outlier detection literature on data from other domains. Existing algorithms often succeed at identifying irregularities without any supervision, with at least one algorithm exhibiting positive performance 70.6% of the time. However, performance across datasets varies substantially with no single algorithm performing consistently well, motivating new techniques for unsupervised model selection or other means of aggregating potentially discordant predictions from multiple candidate models. We propose the Meta-learned Probabilistic Ensemble (MePE), a simple algorithm for aggregating the predictions of multiple unsupervised models, and show that it performs favourably compared to recent meta-learning approaches for outlier detection model selection. While meta-learning shows promise, small ensembles outperform all forms of meta-learning on average, a negative result that may guide the application of current outlier detection approaches in healthcare and other real-world domains.

Related papers

Random Similarity Isolation Forests [1.2845170214324664]
We propose a multi-modal outlier detection algorithm called Random Similarity Isolation Forest. Our method combines the notions of isolation and similarity-based projection to handle datasets with mixtures of features of arbitrary data types.
arXiv Detail & Related papers (2025-02-26T13:22:19Z)
Dynamic Classification: Leveraging Self-Supervised Classification to Enhance Prediction Performance [2.2736104746143355]
We propose an innovative dynamic classification algorithm aimed at achieving zero missed detections and minimal false positives.<n>The algorithm partitions data in a self-supervised learning-generated way, which allows the model to learn from the training set.<n> Experimental results show that, with minimal data partitioning errors, the algorithm achieves exceptional performance.
arXiv Detail & Related papers (2025-02-26T07:11:12Z)
A Dataset for Evaluating Online Anomaly Detection Approaches for Discrete Multivariate Time Series [0.01874930567916036]
Current publicly available datasets are too small, not diverse and feature trivial anomalies. We propose a solution: a diverse, extensive, and non-trivial dataset generated via state-of-the-art simulation tools. We make different versions of the dataset available, where training and test subsets are offered in contaminated and clean versions. As expected, the baseline experimentation shows that the approaches trained on the semi-supervised version of the dataset outperform their unsupervised counterparts.
arXiv Detail & Related papers (2024-11-21T09:03:12Z)
Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest. Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z)
On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning [85.75164588939185]
We study the discriminative probabilistic modeling on a continuous domain for the data prediction task of (multimodal) self-supervised representation learning. We conduct generalization error analysis to reveal the limitation of current InfoNCE-based contrastive loss for self-supervised representation learning. We propose a novel non-parametric method for approximating the sum of conditional probability densities required by MIS.
arXiv Detail & Related papers (2024-10-11T18:02:46Z)
Boosting Out-of-Distribution Detection with Multiple Pre-trained Models [41.66566916581451]
Post hoc detection utilizing pre-trained models has shown promising performance and can be scaled to large-scale problems. We propose a detection enhancement method by ensembling multiple detection decisions derived from a zoo of pre-trained models. Our method substantially improves the relative performance by 65.40% and 26.96% on the CIFAR10 and ImageNet benchmarks.
arXiv Detail & Related papers (2022-12-24T12:11:38Z)
Towards Diverse Evaluation of Class Incremental Learning: A Representation Learning Perspective [67.45111837188685]
Class incremental learning (CIL) algorithms aim to continually learn new object classes from incrementally arriving data. We experimentally analyze neural network models trained by CIL algorithms using various evaluation protocols in representation learning.
arXiv Detail & Related papers (2022-06-16T11:44:11Z)
Zero-shot meta-learning for small-scale data from human subjects [10.320654885121346]
We develop a framework to rapidly adapt to a new prediction task with limited training data for out-of-sample test data. Our model learns the latent treatment effects of each intervention and, by design, can naturally handle multi-task predictions. Our model has implications for improved generalization of small-size human studies to the wider population.
arXiv Detail & Related papers (2022-03-29T17:42:04Z)
DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator. Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms. This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk. We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
RLAD: Time Series Anomaly Detection through Reinforcement Learning and Active Learning [17.089402177923297]
We introduce a new semi-supervised, time series anomaly detection algorithm. It uses deep reinforcement learning and active learning to efficiently learn and adapt to anomalies in real-world time series data. It requires no manual tuning of parameters and outperforms all state-of-art methods we compare with.
arXiv Detail & Related papers (2021-03-31T15:21:15Z)
Meta-learning One-class Classifiers with Eigenvalue Solvers for Supervised Anomaly Detection [55.888835686183995]
We propose a neural network-based meta-learning method for supervised anomaly detection. We experimentally demonstrate that the proposed method achieves better performance than existing anomaly detection and few-shot learning methods.
arXiv Detail & Related papers (2021-03-01T01:43:04Z)
Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples. We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries. We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus [62.86856923633923]
We present a robust estimator for fitting multiple parametric models of the same form to noisy measurements. In contrast to previous works, which resorted to hand-crafted search strategies for multiple model detection, we learn the search strategy from data. For self-supervised learning of the search, we evaluate the proposed algorithm on multi-homography estimation and demonstrate an accuracy that is superior to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-08T17:37:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.