AUPIMO: Redefining Visual Anomaly Detection Benchmarks with High Speed
and Low Tolerance
- URL: http://arxiv.org/abs/2401.01984v3
- Date: Thu, 8 Feb 2024 12:58:13 GMT
- Title: AUPIMO: Redefining Visual Anomaly Detection Benchmarks with High Speed
and Low Tolerance
- Authors: Joao P. C. Bertoldo and Dick Ameln and Ashwin Vaidya and Samet
Ak\c{c}ay
- Abstract summary: Per-IMage Overlap (PIMO) is a novel metric that addresses the shortcomings of AUROC and AUPRO.
measuring recall per image simplifies computation and is more robust to noisy annotations.
Our experiments demonstrate that PIMO offers practical advantages and nuanced performance insights.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recent advances in visual anomaly detection research have seen AUROC and
AUPRO scores on public benchmark datasets such as MVTec and VisA converge
towards perfect recall, giving the impression that these benchmarks are
near-solved. However, high AUROC and AUPRO scores do not always reflect
qualitative performance, which limits the validity of these metrics in
real-world applications. We argue that the artificial ceiling imposed by the
lack of an adequate evaluation metric restrains progression of the field, and
it is crucial that we revisit the evaluation metrics used to rate our
algorithms. In response, we introduce Per-IMage Overlap (PIMO), a novel metric
that addresses the shortcomings of AUROC and AUPRO. PIMO retains the
recall-based nature of the existing metrics but introduces two distinctions:
the assignment of curves (and respective area under the curve) is per-image,
and its X-axis relies solely on normal images. Measuring recall per image
simplifies instance score indexing and is more robust to noisy annotations. As
we show, it also accelerates computation and enables the usage of statistical
tests to compare models. By imposing low tolerance for false positives on
normal images, PIMO provides an enhanced model validation procedure and
highlights performance variations across datasets. Our experiments demonstrate
that PIMO offers practical advantages and nuanced performance insights that
redefine anomaly detection benchmarks -- notably challenging the perception
that MVTec AD and VisA datasets have been solved by contemporary models.
Available on GitHub: https://github.com/jpcbertoldo/aupimo.
Related papers
- Assessing Brittleness of Image-Text Retrieval Benchmarks from Vision-Language Models Perspective [44.045767657945895]
We focus on examining the brittleness of the ITR evaluation pipeline with a focus on concept granularity.
To investigate the performance of VLMs on coarse and fine-grained datasets, we introduce a taxonomy of perturbations.
The results demonstrate that although perturbations generally degrade model performance, the fine-grained datasets exhibit a smaller performance drop than their standard counterparts.
arXiv Detail & Related papers (2024-07-21T18:08:44Z) - Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image Classification [2.1223532600703385]
This paper presents an innovative disjoint sampling approach for training SOTA models on Hyperspectral image classification (HSIC) tasks.
By separating training, validation, and test data without overlap, the proposed method facilitates a fairer evaluation of how well a model can classify pixels it was not exposed to during training or validation.
This rigorous methodology is critical for advancing SOTA models and their real-world application to large-scale land mapping with Hyperspectral sensors.
arXiv Detail & Related papers (2024-04-23T11:40:52Z) - Cobra Effect in Reference-Free Image Captioning Metrics [58.438648377314436]
A proliferation of reference-free methods, leveraging visual-language pre-trained models (VLMs), has emerged.
In this paper, we study if there are any deficiencies in reference-free metrics.
We employ GPT-4V as an evaluative tool to assess generated sentences and the result reveals that our approach achieves state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2024-02-18T12:36:23Z) - Refining the ONCE Benchmark with Hyperparameter Tuning [45.55545585587993]
This work focuses on the evaluation of semi-supervised learning approaches for point cloud data.
Data annotation is of paramount importance in the context of LiDAR applications.
We show that improvements from previous semi-supervised methods may not be as profound as previously thought.
arXiv Detail & Related papers (2023-11-10T13:39:07Z) - Detecting Edit Failures In Large Language Models: An Improved
Specificity Benchmark [9.45927470587879]
We extend the existing CounterFact benchmark to include a dynamic component and dub our benchmark CounterFact+.
We use this improved benchmark to evaluate recent model editing techniques and find that they suffer from low specificity.
arXiv Detail & Related papers (2023-05-27T19:08:04Z) - GREAT Score: Global Robustness Evaluation of Adversarial Perturbation
using Generative Models [74.43215520371506]
We present a new framework, called GREAT Score, for global robustness evaluation of adversarial perturbation using generative models.
We show high correlation and significantly reduced cost of GREAT Score when compared to the attack-based model ranking on RobustBench.
GREAT Score can be used for remote auditing of privacy-sensitive black-box models.
arXiv Detail & Related papers (2023-04-19T14:58:27Z) - Exploring validation metrics for offline model-based optimisation with
diffusion models [50.404829846182764]
In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle.
While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples.
This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation.
arXiv Detail & Related papers (2022-11-19T16:57:37Z) - Be Your Own Neighborhood: Detecting Adversarial Example by the
Neighborhood Relations Built on Self-Supervised Learning [64.78972193105443]
This paper presents a novel AE detection framework, named trustworthy for predictions.
performs the detection by distinguishing the AE's abnormal relation with its augmented versions.
An off-the-shelf Self-Supervised Learning (SSL) model is used to extract the representation and predict the label.
arXiv Detail & Related papers (2022-08-31T08:18:44Z) - No Shifted Augmentations (NSA): compact distributions for robust
self-supervised Anomaly Detection [4.243926243206826]
Unsupervised Anomaly detection (AD) requires building a notion of normalcy, distinguishing in-distribution (ID) and out-of-distribution (OOD) data.
We investigate how the emph geometrical compactness of the ID feature distribution makes isolating and detecting outliers easier.
We propose novel architectural modifications to the self-supervised feature learning step, that enable such compact distributions for ID data to be learned.
arXiv Detail & Related papers (2022-03-19T15:55:32Z) - Open-Set Recognition: A Good Closed-Set Classifier is All You Need [146.6814176602689]
We show that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes.
We use this correlation to boost the performance of the cross-entropy OSR 'baseline' by improving its closed-set accuracy.
We also construct new benchmarks which better respect the task of detecting semantic novelty.
arXiv Detail & Related papers (2021-10-12T17:58:59Z) - Dense Label Encoding for Boundary Discontinuity Free Rotation Detection [69.75559390700887]
This paper explores a relatively less-studied methodology based on classification.
We propose new techniques to push its frontier in two aspects.
Experiments and visual analysis on large-scale public datasets for aerial images show the effectiveness of our approach.
arXiv Detail & Related papers (2020-11-19T05:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.