EntropyStop: Unsupervised Deep Outlier Detection with Loss Entropy
- URL: http://arxiv.org/abs/2405.12502v3
- Date: Sat, 29 Jun 2024 01:40:46 GMT
- Title: EntropyStop: Unsupervised Deep Outlier Detection with Loss Entropy
- Authors: Yihong Huang, Yuang Zhang, Liping Wang, Fan Zhang, Xuemin Lin,
- Abstract summary: We propose a zero-label entropy metric named Loss Entropy for loss distribution, enabling us to infer optimal stopping points for training without labels.
We also develop an automated early-stopping algorithm, EntropyStop, which halts training when loss entropy suggests the maximum model detection capability.
- Score: 19.154826741973277
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unsupervised Outlier Detection (UOD) is an important data mining task. With the advance of deep learning, deep Outlier Detection (OD) has received broad interest. Most deep UOD models are trained exclusively on clean datasets to learn the distribution of the normal data, which requires huge manual efforts to clean the real-world data if possible. Instead of relying on clean datasets, some approaches directly train and detect on unlabeled contaminated datasets, leading to the need for methods that are robust to such conditions. Ensemble methods emerged as a superior solution to enhance model robustness against contaminated training sets. However, the training time is greatly increased by the ensemble. In this study, we investigate the impact of outliers on the training phase, aiming to halt training on unlabeled contaminated datasets before performance degradation. Initially, we noted that blending normal and anomalous data causes AUC fluctuations, a label-dependent measure of detection accuracy. To circumvent the need for labels, we propose a zero-label entropy metric named Loss Entropy for loss distribution, enabling us to infer optimal stopping points for training without labels. Meanwhile, we theoretically demonstrate negative correlation between entropy metric and the label-based AUC. Based on this, we develop an automated early-stopping algorithm, EntropyStop, which halts training when loss entropy suggests the maximum model detection capability. We conduct extensive experiments on ADBench (including 47 real datasets), and the overall results indicate that AutoEncoder (AE) enhanced by our approach not only achieves better performance than ensemble AEs but also requires under 2\% of training time. Lastly, our proposed metric and early-stopping approach are evaluated on other deep OD models, exhibiting their broad potential applicability.
Related papers
- Leveraging Latent Diffusion Models for Training-Free In-Distribution Data Augmentation for Surface Defect Detection [9.784793380119806]
We introduce DIAG, a training-free Diffusion-based In-distribution Anomaly Generation pipeline for data augmentation.
Unlike conventional image generation techniques, we implement a human-in-the-loop pipeline, where domain experts provide multimodal guidance to the model.
We demonstrate the efficacy and versatility of DIAG with respect to state-of-the-art data augmentation approaches on the challenging KSDD2 dataset.
arXiv Detail & Related papers (2024-07-04T14:28:52Z) - Incremental Self-training for Semi-supervised Learning [56.57057576885672]
IST is simple yet effective and fits existing self-training-based semi-supervised learning methods.
We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed.
arXiv Detail & Related papers (2024-04-14T05:02:00Z) - COFT-AD: COntrastive Fine-Tuning for Few-Shot Anomaly Detection [19.946344683965425]
We propose a novel methodology to address the challenge of FSAD.
We employ a model pre-trained on a large source dataset to model weights.
We evaluate few-shot anomaly detection on on 3 controlled AD tasks and 4 real-world AD tasks to demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-02-29T09:48:19Z) - RoSAS: Deep Semi-Supervised Anomaly Detection with
Contamination-Resilient Continuous Supervision [21.393509817509464]
This paper proposes a novel semi-supervised anomaly detection method, which devises textitcontamination-resilient continuous supervisory signals
Our approach significantly outperforms state-of-the-art competitors by 20%-30% in AUC-PR.
arXiv Detail & Related papers (2023-07-25T04:04:49Z) - MAPS: A Noise-Robust Progressive Learning Approach for Source-Free
Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation.
This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z) - SRoUDA: Meta Self-training for Robust Unsupervised Domain Adaptation [25.939292305808934]
Unsupervised domain adaptation (UDA) can transfer knowledge learned from rich-label dataset to unlabeled target dataset.
In this paper, we present a new meta self-training pipeline, named SRoUDA, for improving adversarial robustness of UDA models.
arXiv Detail & Related papers (2022-12-12T14:25:40Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - Entropy Maximization and Meta Classification for Out-Of-Distribution
Detection in Semantic Segmentation [7.305019142196585]
"Out-of-distribution" (OoD) samples are crucial for many applications such as automated driving.
A natural baseline approach to OoD detection is to threshold on the pixel-wise softmax entropy.
We present a two-step procedure that significantly improves that approach.
arXiv Detail & Related papers (2020-12-09T11:01:06Z) - SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier
Detection [63.253850875265115]
Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples.
We propose a modular acceleration system, called SUOD, to address it.
arXiv Detail & Related papers (2020-03-11T00:22:50Z) - Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass.
We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.