An Evidence-Based Post-Hoc Adjustment Framework for Anomaly Detection Under Data Contamination
- URL: http://arxiv.org/abs/2510.21296v1
- Date: Fri, 24 Oct 2025 09:45:26 GMT
- Title: An Evidence-Based Post-Hoc Adjustment Framework for Anomaly Detection Under Data Contamination
- Authors: Sukanya Patra, Souhaib Ben Taieb,
- Abstract summary: Unsupervised anomaly detection methods typically assume clean training data, yet real-world datasets often contain undetected or mislabeled anomalies.<n>We propose EPHAD, a test-time adaptation framework that updates the outputs of AD models trained on contaminated datasets using evidence gathered at test time.
- Score: 6.001574550157585
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unsupervised anomaly detection (AD) methods typically assume clean training data, yet real-world datasets often contain undetected or mislabeled anomalies, leading to significant performance degradation. Existing solutions require access to the training pipelines, data or prior knowledge of the proportions of anomalies in the data, limiting their real-world applicability. To address this challenge, we propose EPHAD, a simple yet effective test-time adaptation framework that updates the outputs of AD models trained on contaminated datasets using evidence gathered at test time. Our approach integrates the prior knowledge captured by the AD model trained on contaminated datasets with evidence derived from multimodal foundation models like Contrastive Language-Image Pre-training (CLIP), classical AD methods like the Latent Outlier Factor or domain-specific knowledge. We illustrate the intuition behind EPHAD using a synthetic toy example and validate its effectiveness through comprehensive experiments across eight visual AD datasets, twenty-six tabular AD datasets, and a real-world industrial AD dataset. Additionally, we conduct an ablation study to analyse hyperparameter influence and robustness to varying contamination levels, demonstrating the versatility and robustness of EPHAD across diverse AD models and evidence pairs. To ensure reproducibility, our code is publicly available at https://github.com/sukanyapatra1997/EPHAD.
Related papers
- Robust Molecular Property Prediction via Densifying Scarce Labeled Data [53.24886143129006]
In drug discovery, compounds most critical for advancing research often lie beyond the training set.<n>We propose a novel bilevel optimization approach that leverages unlabeled data to interpolate between in-distribution (ID) and out-of-distribution (OOD) data.
arXiv Detail & Related papers (2025-06-13T15:27:40Z) - Anomaly Detection and Generation with Diffusion Models: A Survey [51.61574868316922]
Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing.<n>Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest.<n>This survey aims to guide researchers and practitioners in leveraging DMs for innovative AD solutions across diverse applications.
arXiv Detail & Related papers (2025-06-11T03:29:18Z) - PGAD: Prototype-Guided Adaptive Distillation for Multi-Modal Learning in AD Diagnosis [7.260212065205214]
Missing modalities pose a major issue in Alzheimer's Disease (AD) diagnosis.<n>Most existing methods train only on complete data, ignoring the large proportion of incomplete samples in real-world datasets like ADNI.<n>We propose a Prototype-Guided Adaptive Distillation (PGAD) framework that directly incorporates incomplete multi-modal data into training.
arXiv Detail & Related papers (2025-03-05T14:39:31Z) - LEAD: Large Foundation Model for EEG-Based Alzheimer's Disease Detection [9.286594823355363]
We propose LEAD, the first large-scale foundation model for EEG analysis in dementia.<n>We pre-train on 12 datasets (3 AD-related and 9 non-AD) and fine-tune/test on 4 AD datasets.<n>Compared with 10 baselines, LEAD consistently obtains superior subject-level detection performance.
arXiv Detail & Related papers (2025-02-02T04:19:35Z) - Deep evolving semi-supervised anomaly detection [14.027613461156864]
The aim of this paper is to formalise the task of continual semi-supervised anomaly detection (CSAD)<n>The paper introduces a baseline model of a variational autoencoder (VAE) to work with semi-supervised data along with a continual learning method of deep generative replay with outlier rejection.
arXiv Detail & Related papers (2024-12-01T15:48:37Z) - DACAD: Domain Adaptation Contrastive Learning for Anomaly Detection in Multivariate Time Series [83.76994646443498]
In time series anomaly detection, the scarcity of labeled data poses a challenge to the development of accurate models.<n>We propose a novel Domain Contrastive learning model for Anomaly Detection in time series (DACAD)<n>Our model employs supervised contrastive loss for the source domain and self-supervised contrastive triplet loss for the target domain.
arXiv Detail & Related papers (2024-04-17T11:20:14Z) - Weakly Supervised Anomaly Detection via Knowledge-Data Alignment [24.125871437370357]
Anomaly detection plays a pivotal role in numerous web-based applications, including malware detection, anti-money laundering, device failure detection, and network fault analysis.
Weakly Supervised Anomaly Detection (WSAD) has been introduced with a limited number of labeled anomaly samples to enhance model performance.
We introduce a novel framework Knowledge-Data Alignment (KDAlign) to integrate rule knowledge, typically summarized by human experts, to supplement the limited labeled data.
arXiv Detail & Related papers (2024-02-06T07:57:13Z) - On the Universal Adversarial Perturbations for Efficient Data-free
Adversarial Detection [55.73320979733527]
We propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs.
Experimental results show that our method achieves competitive detection performance on various text classification tasks.
arXiv Detail & Related papers (2023-06-27T02:54:07Z) - Anomaly Detection under Distribution Shift [24.094884041252044]
Anomaly detection (AD) is a crucial machine learning task that aims to learn patterns from a set of normal training samples to identify abnormal samples in test data.
Most existing AD studies assume that the training and test data are drawn from the same data distribution, but the test data can have large distribution shifts.
We introduce a novel robust AD approach to diverse distribution shifts by minimizing the distribution gap between in-distribution and OOD normal samples in both the training and inference stages.
arXiv Detail & Related papers (2023-03-24T07:39:08Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Data-Efficient and Interpretable Tabular Anomaly Detection [54.15249463477813]
We propose a novel framework that adapts a white-box model class, Generalized Additive Models, to detect anomalies.
In addition, the proposed framework, DIAD, can incorporate a small amount of labeled data to further boost anomaly detection performances in semi-supervised settings.
arXiv Detail & Related papers (2022-03-03T22:02:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.