How Low Can You Go? Surfacing Prototypical In-Distribution Samples for
Unsupervised Anomaly Detection
- URL: http://arxiv.org/abs/2312.03804v1
- Date: Wed, 6 Dec 2023 15:30:47 GMT
- Title: How Low Can You Go? Surfacing Prototypical In-Distribution Samples for
Unsupervised Anomaly Detection
- Authors: Felix Meissen, Johannes Getzner, Alexander Ziller, Georgios Kaissis,
Daniel Rueckert
- Abstract summary: Unsupervised anomaly detection (UAD) alleviates large labeling efforts by training exclusively on unlabeled in-distribution data.
We show that using only very few training samples can already match - and in some cases even improve - anomaly detection compared to training with the whole training dataset.
- Score: 56.06401423880554
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Unsupervised anomaly detection (UAD) alleviates large labeling efforts by
training exclusively on unlabeled in-distribution data and detecting outliers
as anomalies. Generally, the assumption prevails that large training datasets
allow the training of higher-performing UAD models. However, in this work, we
show that using only very few training samples can already match - and in some
cases even improve - anomaly detection compared to training with the whole
training dataset. We propose three methods to identify prototypical samples
from a large dataset of in-distribution samples. We demonstrate that by
training with a subset of just ten such samples, we achieve an area under the
receiver operating characteristics curve (AUROC) of $96.37 \%$ on CIFAR10,
$92.59 \%$ on CIFAR100, $95.37 \%$ on MNIST, $95.38 \%$ on Fashion-MNIST,
$96.37 \%$ on MVTec-AD, $98.81 \%$ on BraTS, and $81.95 \%$ on RSNA pneumonia
detection, even exceeding the performance of full training in $25/67$ classes
we tested. Additionally, we show that the prototypical in-distribution samples
identified by our proposed methods translate well to different models and other
datasets and that using their characteristics as guidance allows for successful
manual selection of small subsets of high-performing samples. Our code is
available at https://anonymous.4open.science/r/uad_prototypical_samples/
Related papers
- Which Pretrain Samples to Rehearse when Finetuning Pretrained Models? [60.59376487151964]
Fine-tuning pretrained models on specific tasks is now the de facto approach for text and vision tasks.
A known pitfall of this approach is the forgetting of pretraining knowledge that happens during finetuning.
We propose a novel sampling scheme, mix-cd, that identifies and prioritizes samples that actually face forgetting.
arXiv Detail & Related papers (2024-02-12T22:32:12Z) - Self-supervised learning of multi-omics embeddings in the low-label,
high-data regime [0.0]
Contrastive, self-supervised learning (SSL) is used to train a model that predicts cancer type from unimodal, mRNA or RPPA expression data.
A late-fusion model is proposed, where each omics is passed through its own sub-network, the outputs of which are averaged and passed to the pretraining or downstream objective function.
Multi-modal pretraining is shown to improve predictions from a single omics, and we argue that this is useful for datasets with many unlabelled multi-modal samples, but few labelled samples.
arXiv Detail & Related papers (2023-11-16T15:32:22Z) - Bridging the Gap: Addressing Discrepancies in Diffusion Model Training
for Classifier-Free Guidance [1.6804613362826175]
Diffusion models have emerged as a pivotal advancement in generative models.
In this paper we aim to underscore a discrepancy between conventional training methods and the desired conditional sampling behavior.
We introduce an updated loss function that better aligns training objectives with sampling behaviors.
arXiv Detail & Related papers (2023-11-02T02:03:12Z) - DOS: Diverse Outlier Sampling for Out-of-Distribution Detection [18.964462007139055]
We show that diversity is critical in sampling outliers for OOD detection performance.
We propose a straightforward and novel sampling strategy named DOS (Diverse Outlier Sampling) to select diverse and informative outliers.
arXiv Detail & Related papers (2023-06-03T07:17:48Z) - Selecting Learnable Training Samples is All DETRs Need in Crowded
Pedestrian Detection [72.97320260601347]
In crowded pedestrian detection, the performance of DETRs is still unsatisfactory due to the inappropriate sample selection method.
We propose Sample Selection for Crowded Pedestrians, which consists of the constraint-guided label assignment scheme (CGLA)
Experimental results show that the proposed SSCP effectively improves the baselines without introducing any overhead in inference.
arXiv Detail & Related papers (2023-05-18T08:28:01Z) - Bias Mimicking: A Simple Sampling Approach for Bias Mitigation [57.17709477668213]
We introduce a new class-conditioned sampling method: Bias Mimicking.
Bias Mimicking improves underrepresented groups' accuracy of sampling methods by 3% over four benchmarks.
arXiv Detail & Related papers (2022-09-30T17:33:00Z) - TTAPS: Test-Time Adaption by Aligning Prototypes using Self-Supervision [70.05605071885914]
We propose a novel modification of the self-supervised training algorithm SwAV that adds the ability to adapt to single test samples.
We show the success of our method on the common benchmark dataset CIFAR10-C.
arXiv Detail & Related papers (2022-05-18T05:43:06Z) - Binary classification with ambiguous training data [69.50862982117127]
In supervised learning, we often face with ambiguous (A) samples that are difficult to label even by domain experts.
This problem is substantially different from semi-supervised learning since unlabeled samples are not necessarily difficult samples.
arXiv Detail & Related papers (2020-11-05T00:53:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.