Related papers: How Low Can You Go? Surfacing Prototypical In-Distribution Samples for Unsupervised Anomaly Detection

How Low Can You Go? Surfacing Prototypical In-Distribution Samples for Unsupervised Anomaly Detection

URL: http://arxiv.org/abs/2312.03804v1
Date: Wed, 6 Dec 2023 15:30:47 GMT
Title: How Low Can You Go? Surfacing Prototypical In-Distribution Samples for Unsupervised Anomaly Detection
Authors: Felix Meissen, Johannes Getzner, Alexander Ziller, Georgios Kaissis, Daniel Rueckert
Abstract summary: Unsupervised anomaly detection (UAD) alleviates large labeling efforts by training exclusively on unlabeled in-distribution data. We show that using only very few training samples can already match - and in some cases even improve - anomaly detection compared to training with the whole training dataset.
Score: 56.06401423880554
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Unsupervised anomaly detection (UAD) alleviates large labeling efforts by training exclusively on unlabeled in-distribution data and detecting outliers as anomalies. Generally, the assumption prevails that large training datasets allow the training of higher-performing UAD models. However, in this work, we show that using only very few training samples can already match - and in some cases even improve - anomaly detection compared to training with the whole training dataset. We propose three methods to identify prototypical samples from a large dataset of in-distribution samples. We demonstrate that by training with a subset of just ten such samples, we achieve an area under the receiver operating characteristics curve (AUROC) of $96.37 \%$ on CIFAR10, $92.59 \%$ on CIFAR100, $95.37 \%$ on MNIST, $95.38 \%$ on Fashion-MNIST, $96.37 \%$ on MVTec-AD, $98.81 \%$ on BraTS, and $81.95 \%$ on RSNA pneumonia detection, even exceeding the performance of full training in $25/67$ classes we tested. Additionally, we show that the prototypical in-distribution samples identified by our proposed methods translate well to different models and other datasets and that using their characteristics as guidance allows for successful manual selection of small subsets of high-performing samples. Our code is available at https://anonymous.4open.science/r/uad_prototypical_samples/

Related papers

Enhancing Sample Selection by Cutting Mislabeled Easy Examples [62.13094877228772]
We show that mislabeled examples correctly predicted by the model early in the training process are particularly harmful to model performance. We propose Early Cutting, which employs the model's later training state to re-select the confident subset identified early in training.
arXiv Detail & Related papers (2025-02-12T09:12:45Z)
DOTA: Distributional Test-Time Adaptation of Vision-Language Models [52.98590762456236]
Training-free test-time dynamic adapter (TDA) is a promising approach to address this issue. We propose a simple yet effective method for DistributiOnal Test-time Adaptation (Dota) Dota continually estimates the distributions of test samples, allowing the model to continually adapt to the deployment environment.
arXiv Detail & Related papers (2024-09-28T15:03:28Z)
Semi-Supervised Learning for hyperspectral images by non parametrically predicting view assignment [25.198550162904713]
Hyperspectral image (HSI) classification is gaining a lot of momentum in present time because of high inherent spectral information within the images. Recently, to effectively train the deep learning models with minimal labelled samples, the unlabeled samples are also being leveraged in self-supervised and semi-supervised setting. In this work, we leverage the idea of semi-supervised learning to assist the discriminative self-supervised pretraining of the models.
arXiv Detail & Related papers (2023-06-19T14:13:56Z)
Temporal Output Discrepancy for Loss Estimation-based Active Learning [65.93767110342502]
We present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss. Our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks.
arXiv Detail & Related papers (2022-12-20T19:29:37Z)
Reducing Training Sample Memorization in GANs by Training with Memorization Rejection [80.0916819303573]
We propose rejection memorization, a training scheme that rejects generated samples that are near-duplicates of training samples during training. Our scheme is simple, generic and can be directly applied to any GAN architecture.
arXiv Detail & Related papers (2022-10-21T20:17:50Z)
ScatterSample: Diversified Label Sampling for Data Efficient Graph Neural Network Learning [22.278779277115234]
In some applications where graph neural network (GNN) training is expensive, labeling new instances is expensive. We develop a data-efficient active sampling framework, ScatterSample, to train GNNs under an active learning setting. Our experiments on five datasets show that ScatterSample significantly outperforms the other GNN active learning baselines.
arXiv Detail & Related papers (2022-06-09T04:05:02Z)
POODLE: Improving Few-shot Learning via Penalizing Out-of-Distribution Samples [19.311470287767385]
We propose to use out-of-distribution samples, i.e., unlabeled samples coming from outside the target classes, to improve few-shot learning. Our approach is simple to implement, agnostic to feature extractors, lightweight without any additional cost for pre-training, and applicable to both inductive and transductive settings.
arXiv Detail & Related papers (2022-06-08T18:59:21Z)
TTAPS: Test-Time Adaption by Aligning Prototypes using Self-Supervision [70.05605071885914]
We propose a novel modification of the self-supervised training algorithm SwAV that adds the ability to adapt to single test samples. We show the success of our method on the common benchmark dataset CIFAR10-C.
arXiv Detail & Related papers (2022-05-18T05:43:06Z)
One for More: Selecting Generalizable Samples for Generalizable ReID Model [92.40951770273972]
This paper proposes a one-for-more training objective that takes the generalization ability of selected samples as a loss function. Our proposed one-for-more based sampler can be seamlessly integrated into the ReID training framework.
arXiv Detail & Related papers (2020-12-10T06:37:09Z)
Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle. In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize. Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.