AnomalyMatch: Discovering Rare Objects of Interest with Semi-supervised and Active Learning
- URL: http://arxiv.org/abs/2505.03509v2
- Date: Thu, 30 Oct 2025 09:17:12 GMT
- Title: AnomalyMatch: Discovering Rare Objects of Interest with Semi-supervised and Active Learning
- Authors: Pablo Gómez, Laslo E. Ruhberg, Maria Teresa Nardone, David O'Ryan,
- Abstract summary: We present AnomalyMatch, an anomaly detection framework combining the semi-supervised FixMatch algorithm and active learning.<n>AnomalyMatch is tailored for large-scale applications and integrated into the ESA Datalabs science platform.<n> Evaluations on the GalaxyMNIST astronomical dataset and the miniImageNet natural-image benchmark under severe class imbalance display strong performance.
- Score: 1.7499351967216341
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Anomaly detection in large datasets is essential in astronomy and computer vision. However, due to a scarcity of labelled data, it is often infeasible to apply supervised methods to anomaly detection. We present AnomalyMatch, an anomaly detection framework combining the semi-supervised FixMatch algorithm using EfficientNet classifiers with active learning. AnomalyMatch is tailored for large-scale applications and integrated into the ESA Datalabs science platform. In this method, we treat anomaly detection as a binary classification problem and efficiently utilise limited labelled and abundant unlabelled images for training. We enable active learning via a user interface for verification of high-confidence anomalies and correction of false positives. Evaluations on the GalaxyMNIST astronomical dataset and the miniImageNet natural-image benchmark under severe class imbalance display strong performance. Starting from five to ten labelled anomalies, we achieve an average AUROC of 0.96 (miniImageNet) and 0.89 (GalaxyMNIST), with respective AUPRC of 0.82 and 0.77. After three active learning cycles, anomalies are ranked with 76% (miniImageNet) to 94% (GalaxyMNIST) precision in the top 1% of the highest-ranking images by score. We compare to the established Astronomaly software on selected 'odd' galaxies from the 'Galaxy Zoo - The Galaxy Challenge' dataset, achieving comparable performance with an average AUROC of 0.83. Our results underscore the exceptional utility and scalability of this approach for anomaly discovery, highlighting the value of specialised approaches for domains characterised by severe label scarcity.
Related papers
- AnomalyNCD: Towards Novel Anomaly Class Discovery in Industrial Scenarios [16.77348120041789]
We propose AnomalyNCD, a multi-class anomaly classification network compatible with different anomaly detection methods.<n>To address the non-prominence of anomalies, we design main element binarization (MEBin) to obtain anomaly-centered images.<n>Next, to learn anomalies with weak semantics, we design mask-guided representation learning, which focuses on isolated anomalies guided by masks.
arXiv Detail & Related papers (2024-10-18T11:07:12Z) - ARC: A Generalist Graph Anomaly Detector with In-Context Learning [62.202323209244]
ARC is a generalist GAD approach that enables a one-for-all'' GAD model to detect anomalies across various graph datasets on-the-fly.<n> equipped with in-context learning, ARC can directly extract dataset-specific patterns from the target dataset.<n>Extensive experiments on multiple benchmark datasets from various domains demonstrate the superior anomaly detection performance, efficiency, and generalizability of ARC.
arXiv Detail & Related papers (2024-05-27T02:42:33Z) - Self-supervised learning for classifying paranasal anomalies in the maxillary sinus [31.45131665942058]
Self-supervised learning can be used to learn representations from unlabelled data.
There are no SSL methods designed for the downstream task of classifying paranasal anomalies in the maxillary sinus.
Our approach uses a 3D Convolutional Autoencoder trained in an unsupervised anomaly detection framework.
arXiv Detail & Related papers (2024-04-29T11:14:11Z) - Focused Active Learning for Histopathological Image Classification [10.489874002201644]
Active Learning has the potential to solve a major problem of digital pathology: the efficient acquisition of labeled data for machine learning algorithms.
We propose Focused Active Learning (FocAL), which combines a Bayesian Neural Network with Out-of-Distribution detection to estimate different uncertainties for the acquisition function.
We perform extensive experiments to validate our method on MNIST and the real-world Panda dataset for the classification of prostate cancer.
arXiv Detail & Related papers (2024-04-06T15:31:57Z) - Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets.
We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z) - UniFormaly: Towards Task-Agnostic Unified Framework for Visual Anomaly
Detection [6.260747047974035]
We present UniFormaly, a universal and powerful anomaly detection framework.
We emphasize the necessity of our off-the-shelf approach by pointing out a suboptimal issue in online encoder-based methods.
UniFormaly achieves outstanding results on various tasks and datasets.
arXiv Detail & Related papers (2023-07-24T06:04:12Z) - One-Shot Learning for Periocular Recognition: Exploring the Effect of
Domain Adaptation and Data Bias on Deep Representations [59.17685450892182]
We investigate the behavior of deep representations in widely used CNN models under extreme data scarcity for One-Shot periocular recognition.
We improved state-of-the-art results that made use of networks trained with biometric datasets with millions of images.
Traditional algorithms like SIFT can outperform CNNs in situations with limited data.
arXiv Detail & Related papers (2023-07-11T09:10:16Z) - Uncertainty-inspired Open Set Learning for Retinal Anomaly
Identification [71.06194656633447]
We establish an uncertainty-inspired open-set (UIOS) model, which was trained with fundus images of 9 retinal conditions.
Our UIOS model with thresholding strategy achieved an F1 score of 99.55%, 97.01% and 91.91% for the internal testing set.
UIOS correctly predicted high uncertainty scores, which would prompt the need for a manual check in the datasets of non-target categories retinal diseases, low-quality fundus images, and non-fundus images.
arXiv Detail & Related papers (2023-04-08T10:47:41Z) - EfficientAD: Accurate Visual Anomaly Detection at Millisecond-Level
Latencies [1.1602089225841632]
We propose a lightweight feature extractor that processes an image in less than a millisecond on a modern GPU.
We then use a student-teacher approach to detect anomalous features.
We evaluate our method, called EfficientAD, on 32 datasets from three industrial anomaly detection dataset collections.
arXiv Detail & Related papers (2023-03-25T18:48:33Z) - Semi-Supervised Domain Adaptation for Cross-Survey Galaxy Morphology
Classification and Anomaly Detection [57.85347204640585]
We develop a Universal Domain Adaptation method DeepAstroUDA.
It can be applied to datasets with different types of class overlap.
For the first time, we demonstrate the successful use of domain adaptation on two very different observational datasets.
arXiv Detail & Related papers (2022-11-01T18:07:21Z) - From Unsupervised to Few-shot Graph Anomaly Detection: A Multi-scale Contrastive Learning Approach [26.973056364587766]
Anomaly detection from graph data is an important data mining task in many applications such as social networks, finance, and e-commerce.
We propose a novel framework, graph ANomaly dEtection framework with Multi-scale cONtrastive lEarning (ANEMONE in short)
By using a graph neural network as a backbone to encode the information from multiple graph scales (views), we learn better representation for nodes in a graph.
arXiv Detail & Related papers (2022-02-11T09:45:11Z) - Anomaly Clustering: Grouping Images into Coherent Clusters of Anomaly
Types [60.45942774425782]
We introduce anomaly clustering, whose goal is to group data into coherent clusters of anomaly types.
This is different from anomaly detection, whose goal is to divide anomalies from normal data.
We present a simple yet effective clustering framework using a patch-based pretrained deep embeddings and off-the-shelf clustering methods.
arXiv Detail & Related papers (2021-12-21T23:11:33Z) - Multi-Perspective Anomaly Detection [3.3511723893430476]
We build upon the deep support vector data description algorithm and address multi-perspective anomaly detection.
We employ different augmentation techniques with a denoising process to deal with scarce one-class data.
We evaluate our approach on the new dices dataset using images from two different perspectives and also benchmark on the standard MNIST dataset.
arXiv Detail & Related papers (2021-05-20T17:07:36Z) - RLAD: Time Series Anomaly Detection through Reinforcement Learning and
Active Learning [17.089402177923297]
We introduce a new semi-supervised, time series anomaly detection algorithm.
It uses deep reinforcement learning and active learning to efficiently learn and adapt to anomalies in real-world time series data.
It requires no manual tuning of parameters and outperforms all state-of-art methods we compare with.
arXiv Detail & Related papers (2021-03-31T15:21:15Z) - TadGAN: Time Series Anomaly Detection Using Generative Adversarial
Networks [73.01104041298031]
TadGAN is an unsupervised anomaly detection approach built on Generative Adversarial Networks (GANs)
To capture the temporal correlations of time series, we use LSTM Recurrent Neural Networks as base models for Generators and Critics.
To demonstrate the performance and generalizability of our approach, we test several anomaly scoring techniques and report the best-suited one.
arXiv Detail & Related papers (2020-09-16T15:52:04Z) - Toward Deep Supervised Anomaly Detection: Reinforcement Learning from
Partially Labeled Anomaly Data [150.9270911031327]
We consider the problem of anomaly detection with a small set of partially labeled anomaly examples and a large-scale unlabeled dataset.
Existing related methods either exclusively fit the limited anomaly examples that typically do not span the entire set of anomalies, or proceed with unsupervised learning from the unlabeled data.
We propose here instead a deep reinforcement learning-based approach that enables an end-to-end optimization of the detection of both labeled and unlabeled anomalies.
arXiv Detail & Related papers (2020-09-15T03:05:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.