Automating Outlier Detection via Meta-Learning
- URL: http://arxiv.org/abs/2009.10606v2
- Date: Wed, 17 Mar 2021 14:44:35 GMT
- Title: Automating Outlier Detection via Meta-Learning
- Authors: Yue Zhao, Ryan A. Rossi, Leman Akoglu
- Abstract summary: We develop the first principled data-driven approach to model selection for outlier detection, called MetaOD, based on meta-learning.
We show the effectiveness of MetaOD in selecting a detection model that significantly outperforms the most popular outlier detectors.
To foster and further research on this new problem, we open-source our entire meta-learning system, benchmark environment, and testbed datasets.
- Score: 37.736124230543865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given an unsupervised outlier detection (OD) task on a new dataset, how can
we automatically select a good outlier detection method and its
hyperparameter(s) (collectively called a model)? Thus far, model selection for
OD has been a "black art"; as any model evaluation is infeasible due to the
lack of (i) hold-out data with labels, and (ii) a universal objective function.
In this work, we develop the first principled data-driven approach to model
selection for OD, called MetaOD, based on meta-learning. MetaOD capitalizes on
the past performances of a large body of detection models on existing outlier
detection benchmark datasets, and carries over this prior experience to
automatically select an effective model to be employed on a new dataset without
using any labels. To capture task similarity, we introduce specialized
meta-features that quantify outlying characteristics of a dataset. Through
comprehensive experiments, we show the effectiveness of MetaOD in selecting a
detection model that significantly outperforms the most popular outlier
detectors (e.g., LOF and iForest) as well as various state-of-the-art
unsupervised meta-learners while being extremely fast. To foster
reproducibility and further research on this new problem, we open-source our
entire meta-learning system, benchmark environment, and testbed datasets.
Related papers
- Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest.
Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z) - Zero-shot Outlier Detection via Prior-data Fitted Networks: Model Selection Bygone! [28.823740273813296]
Outlier detection (OD) has numerous applications in environmental monitoring, cybersecurity, finance, and medicine.
Being an inherently unsupervised task, model selection is a key bottleneck for OD without label supervision.
We present FoMo-0D, for zero/0-shot OD exploring a transformative new direction that bypasses the hurdle of model selection altogether.
arXiv Detail & Related papers (2024-09-09T14:41:24Z) - IoTGeM: Generalizable Models for Behaviour-Based IoT Attack Detection [3.3772986620114387]
We present an approach for modelling IoT network attacks that focuses on generalizability, yet also leads to better detection and performance.
First, we present an improved rolling window approach for feature extraction, and introduce a multi-step feature selection process that reduces overfitting.
Second, we build and test models using isolated train and test datasets, thereby avoiding common data leaks.
Third, we rigorously evaluate our methodology using a diverse portfolio of machine learning models, evaluation metrics and datasets.
arXiv Detail & Related papers (2023-10-17T21:46:43Z) - Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection
Capability [70.72426887518517]
Out-of-distribution (OOD) detection is an indispensable aspect of secure AI when deploying machine learning models in real-world applications.
We propose a novel method, Unleashing Mask, which aims to restore the OOD discriminative capabilities of the well-trained model with ID data.
Our method utilizes a mask to figure out the memorized atypical samples, and then finetune the model or prune it with the introduced mask to forget them.
arXiv Detail & Related papers (2023-06-06T14:23:34Z) - Temporal Output Discrepancy for Loss Estimation-based Active Learning [65.93767110342502]
We present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss.
Our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks.
arXiv Detail & Related papers (2022-12-20T19:29:37Z) - Toward Unsupervised Outlier Model Selection [20.12322454417006]
ELECT is a new approach to select an effective model on a new dataset without any labels.
It is based on meta-learning; transferring prior knowledge (e.g. model performance) on historical datasets that are similar to the new one.
It can serve an output on-demand, being able to accommodate varying time budgets.
arXiv Detail & Related papers (2022-11-03T14:14:46Z) - Meta-Learning for Unsupervised Outlier Detection with Optimal Transport [4.035753155957698]
We propose a novel approach to automate outlier detection based on meta-learning from previous datasets with outliers.
We leverage optimal transport in particular, to find the dataset with the most similar underlying distribution, and then apply the outlier detection techniques that proved to work best for that data distribution.
arXiv Detail & Related papers (2022-11-01T10:36:48Z) - Meta-learning One-class Classifiers with Eigenvalue Solvers for
Supervised Anomaly Detection [55.888835686183995]
We propose a neural network-based meta-learning method for supervised anomaly detection.
We experimentally demonstrate that the proposed method achieves better performance than existing anomaly detection and few-shot learning methods.
arXiv Detail & Related papers (2021-03-01T01:43:04Z) - Meta-Regularization by Enforcing Mutual-Exclusiveness [0.8057006406834467]
We propose a regularization technique for meta-learning models that gives the model designer more control over the information flow during meta-training.
Our proposed regularization function shows an accuracy boost of $sim$ $36%$ on the Omniglot dataset.
arXiv Detail & Related papers (2021-01-24T22:57:19Z) - Unsupervised Anomaly Detection with Adversarial Mirrored AutoEncoders [51.691585766702744]
We propose a variant of Adversarial Autoencoder which uses a mirrored Wasserstein loss in the discriminator to enforce better semantic-level reconstruction.
We put forward an alternative measure of anomaly score to replace the reconstruction-based metric.
Our method outperforms the current state-of-the-art methods for anomaly detection on several OOD detection benchmarks.
arXiv Detail & Related papers (2020-03-24T08:26:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.