CoMadOut -- A Robust Outlier Detection Algorithm based on CoMAD
- URL: http://arxiv.org/abs/2211.13314v2
- Date: Mon, 1 Jul 2024 05:05:59 GMT
- Title: CoMadOut -- A Robust Outlier Detection Algorithm based on CoMAD
- Authors: Andreas Lohrer, Daniyal Kazempour, Maximilian Hünemörder, Peer Kröger,
- Abstract summary: Outliers play a significant role, since they bear the potential to distort the predictions of a machine learning algorithm on a given dataset.
To address this problem, we propose the robust outlier detection algorithm CoMadOut.
Our approach can be seen as a robust alternative for outlier detection tasks.
- Score: 0.3749861135832073
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised learning methods are well established in the area of anomaly detection and achieve state of the art performances on outlier datasets. Outliers play a significant role, since they bear the potential to distort the predictions of a machine learning algorithm on a given dataset. Especially among PCA-based methods, outliers have an additional destructive potential regarding the result: they may not only distort the orientation and translation of the principal components, they also make it more complicated to detect outliers. To address this problem, we propose the robust outlier detection algorithm CoMadOut, which satisfies two required properties: (1) being robust towards outliers and (2) detecting them. Our CoMadOut outlier detection variants using comedian PCA define, dependent on its variant, an inlier region with a robust noise margin by measures of in-distribution (variant CMO) and optimized scores by measures of out-of-distribution (variants CMO*), e.g. kurtosis-weighting by CMO+k. These measures allow distribution based outlier scoring for each principal component, and thus, an appropriate alignment of the degree of outlierness between normal and abnormal instances. Experiments comparing CoMadOut with traditional, deep and other comparable robust outlier detection methods showed that the performance of the introduced CoMadOut approach is competitive to well established methods related to average precision (AP), area under the precision recall curve (AUPRC) and area under the receiver operating characteristic (AUROC) curve. In summary our approach can be seen as a robust alternative for outlier detection tasks.
Related papers
- Resultant: Incremental Effectiveness on Likelihood for Unsupervised Out-of-Distribution Detection [63.93728560200819]
Unsupervised out-of-distribution (U-OOD) detection is to identify data samples with a detector trained solely on unlabeled in-distribution (ID) data.
Recent studies have developed various detectors based on DGMs to move beyond likelihood.
We apply two techniques for each direction, specifically post-hoc prior and dataset entropy-mutual calibration.
Experimental results demonstrate that the Resultant could be a new state-of-the-art U-OOD detector.
arXiv Detail & Related papers (2024-09-05T02:58:13Z) - Distributional Shift-Aware Off-Policy Interval Estimation: A Unified
Error Quantification Framework [8.572441599469597]
We study high-confidence off-policy evaluation in the context of infinite-horizon Markov decision processes.
The objective is to establish a confidence interval (CI) for the target policy value using only offline data pre-collected from unknown behavior policies.
We show that our algorithm is sample-efficient, error-robust, and provably convergent even in non-linear function approximation settings.
arXiv Detail & Related papers (2023-09-23T06:35:44Z) - Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative.
We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z) - MAPS: A Noise-Robust Progressive Learning Approach for Source-Free
Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation.
This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z) - Divide and Contrast: Source-free Domain Adaptation via Adaptive
Contrastive Learning [122.62311703151215]
Divide and Contrast (DaC) aims to connect the good ends of both worlds while bypassing their limitations.
DaC divides the target data into source-like and target-specific samples, where either group of samples is treated with tailored goals.
We further align the source-like domain with the target-specific samples using a memory bank-based Maximum Mean Discrepancy (MMD) loss to reduce the distribution mismatch.
arXiv Detail & Related papers (2022-11-12T09:21:49Z) - Stochastic and Private Nonconvex Outlier-Robust PCA [11.688030627514532]
Outlier-robust PCA seeks an underlying low-dimensional linear subspace from a dataset corrupted with outliers.
We show that our methods involve our methods, which involve a geodesic descent and a novel convergence analysis.
The main application method is an effectively private algorithm for outlier-robust PCA.
arXiv Detail & Related papers (2022-03-17T12:00:47Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic
Uncertainty [58.144520501201995]
Bi-Lipschitz regularization of neural network layers preserve relative distances between data instances in the feature spaces of each layer.
With the use of an attentive set encoder, we propose to meta learn either diagonal or diagonal plus low-rank factors to efficiently construct task specific covariance matrices.
We also propose an inference procedure which utilizes scaled energy to achieve a final predictive distribution.
arXiv Detail & Related papers (2021-10-12T22:04:19Z) - Deep Clustering based Fair Outlier Detection [19.601280507914325]
We propose an instance-level weighted representation learning strategy to enhance the joint deep clustering and outlier detection.
Our DCFOD method consistently achieves superior performance on both the outlier detection validity and two types of fairness notions in outlier detection.
arXiv Detail & Related papers (2021-06-09T15:12:26Z) - Mean-Shifted Contrastive Loss for Anomaly Detection [34.97652735163338]
We propose a new loss function which can overcome failure modes of both center-loss and contrastive-loss methods.
Our improvements yield a new anomaly detection approach, based on $textitMean-Shifted Contrastive Loss$.
Our method achieves state-of-the-art anomaly detection performance on multiple benchmarks including $97.5%$ ROC-AUC.
arXiv Detail & Related papers (2021-06-07T17:58:03Z) - Homophily Outlier Detection in Non-IID Categorical Data [43.51919113927003]
This work introduces a novel outlier detection framework and its two instances to identify outliers in categorical data.
It first defines and incorporates distribution-sensitive outlier factors and their interdependence into a value-value graph-based representation.
The learned value outlierness allows for either direct outlier detection or outlying feature selection.
arXiv Detail & Related papers (2021-03-21T23:29:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.