StackCLIP: Clustering-Driven Stacked Prompt in Zero-Shot Industrial Anomaly Detection
- URL: http://arxiv.org/abs/2506.23577v2
- Date: Sat, 05 Jul 2025 16:17:16 GMT
- Title: StackCLIP: Clustering-Driven Stacked Prompt in Zero-Shot Industrial Anomaly Detection
- Authors: Yanning Hou, Yanran Ruan, Junfa Li, Shanshan Wang, Jianfeng Qiu, Ke Xu,
- Abstract summary: We propose a method that transforms category names through multicategory name stacking to create stacked prompts.<n>The Clustering-Driven Stacked Prompts (CSP) module constructs generic prompts by stacking semantically analogous categories.<n>The Ensemble Feature Alignment (EFA) module trains knowledge-specific linear layers tailored for each stack cluster.
- Score: 5.390045840354081
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Enhancing the alignment between text and image features in the CLIP model is a critical challenge in zero-shot industrial anomaly detection tasks. Recent studies predominantly utilize specific category prompts during pretraining, which can cause overfitting to the training categories and limit model generalization. To address this, we propose a method that transforms category names through multicategory name stacking to create stacked prompts, forming the basis of our StackCLIP model. Our approach introduces two key components. The Clustering-Driven Stacked Prompts (CSP) module constructs generic prompts by stacking semantically analogous categories, while utilizing multi-object textual feature fusion to amplify discriminative anomalies among similar objects. The Ensemble Feature Alignment (EFA) module trains knowledge-specific linear layers tailored for each stack cluster and adaptively integrates them based on the attributes of test categories. These modules work together to deliver superior training speed, stability, and convergence, significantly boosting anomaly segmentation performance. Additionally, our stacked prompt framework offers robust generalization across classification tasks. To further improve performance, we introduce the Regulating Prompt Learning (RPL) module, which leverages the generalization power of stacked prompts to refine prompt learning, elevating results in anomaly detection classification tasks. Extensive testing on seven industrial anomaly detection datasets demonstrates that our method achieves state-of-the-art performance in both zero-shot anomaly detection and segmentation tasks.
Related papers
- Training-Free Class Purification for Open-Vocabulary Semantic Segmentation [72.87707878910896]
FreeCP is a training-free class purification framework for semantic segmentation.<n>We conduct experiments across eight benchmarks to validate FreeCP's effectiveness.<n>Results demonstrate that FreeCP, as a plug-and-play module, significantly boosts segmentation performance when combined with other OVSS methods.
arXiv Detail & Related papers (2025-08-01T11:55:12Z) - CKAA: Cross-subspace Knowledge Alignment and Aggregation for Robust Continual Learning [80.18781219542016]
Continual Learning (CL) empowers AI models to continuously learn from sequential task streams.<n>Recent parameter-efficient fine-tuning (PEFT)-based CL methods have garnered increasing attention due to their superior performance.<n>We propose Cross-subspace Knowledge Alignment and Aggregation (CKAA) to enhance robustness against misleading task-ids.
arXiv Detail & Related papers (2025-07-13T03:11:35Z) - Intelligently Augmented Contrastive Tensor Factorization: Empowering Multi-dimensional Time Series Classification in Low-Data Environments [4.77513566805416]
We propose a versatile yet data-efficient framework, Intelligently Augmented Contrastive Factorization (ITA-CTF)<n>ITA-CTF module learns effective representations from multi-dimensional time series.<n>It incorporates a new contrastive loss optimization to similarity learning and class-awareness.<n>Compared to standard and several DL benchmarks, notable performance improvements up to 18.7% were achieved.
arXiv Detail & Related papers (2025-05-03T11:28:13Z) - GenCLIP: Generalizing CLIP Prompts for Zero-shot Anomaly Detection [13.67800822455087]
A key challenge in ZSAD is learning general prompts stably and utilizing them effectively.<n>We propose GenCLIP, a novel framework that learns and leverages general prompts more effectively.<n>We introduce a dual-branch inference strategy, where a vision-enhanced branch captures fine-grained category-specific features, while a query-only branch prioritizes generalization.
arXiv Detail & Related papers (2025-04-21T07:38:25Z) - Category-Adaptive Cross-Modal Semantic Refinement and Transfer for Open-Vocabulary Multi-Label Recognition [59.203152078315235]
We propose a novel category-adaptive cross-modal semantic refinement and transfer (C$2$SRT) framework to explore the semantic correlation.<n>The proposed framework consists of two complementary modules, i.e., intra-category semantic refinement (ISR) module and inter-category semantic transfer (IST) module.<n>Experiments on OV-MLR benchmarks clearly demonstrate that the proposed C$2$SRT framework outperforms current state-of-the-art algorithms.
arXiv Detail & Related papers (2024-12-09T04:00:18Z) - Revitalizing Reconstruction Models for Multi-class Anomaly Detection via Class-Aware Contrastive Learning [19.114941437668705]
We propose a plug-and-play modification by incorporating class-aware contrastive learning (CL)<n> Experiments across four datasets verify the effectiveness of our approach, yielding significant improvements and superior performance compared to advanced methods.
arXiv Detail & Related papers (2024-12-06T04:31:09Z) - MMCL: Boosting Deformable DETR-Based Detectors with Multi-Class Min-Margin Contrastive Learning for Superior Prohibited Item Detection [8.23801404004195]
Prohibited Item detection in X-ray images is one of the most effective security inspection methods.
overlapping unique phenomena in X-ray images lead to the coupling of foreground and background features.
We propose a Multi-Class Min-Margin Contrastive Learning (MMCL) method to clarify the category semantic information of content queries.
arXiv Detail & Related papers (2024-06-05T12:07:58Z) - CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing [66.6712018832575]
Domain generalization (DG) based Face Anti-Spoofing (FAS) aims to improve the model's performance on unseen domains.
We make use of large-scale VLMs like CLIP and leverage the textual feature to dynamically adjust the classifier's weights for exploring generalizable visual features.
arXiv Detail & Related papers (2024-03-21T11:58:50Z) - Toward Multi-class Anomaly Detection: Exploring Class-aware Unified Model against Inter-class Interference [67.36605226797887]
We introduce a Multi-class Implicit Neural representation Transformer for unified Anomaly Detection (MINT-AD)
By learning the multi-class distributions, the model generates class-aware query embeddings for the transformer decoder.
MINT-AD can project category and position information into a feature embedding space, further supervised by classification and prior probability loss functions.
arXiv Detail & Related papers (2024-03-21T08:08:31Z) - Self-regulating Prompts: Foundational Model Adaptation without
Forgetting [112.66832145320434]
We introduce a self-regularization framework for prompting called PromptSRC.
PromptSRC guides the prompts to optimize for both task-specific and task-agnostic general representations.
arXiv Detail & Related papers (2023-07-13T17:59:35Z) - Learning Prompt-Enhanced Context Features for Weakly-Supervised Video
Anomaly Detection [37.99031842449251]
Video anomaly detection under weak supervision presents significant challenges.
We present a weakly supervised anomaly detection framework that focuses on efficient context modeling and enhanced semantic discriminability.
Our approach significantly improves the detection accuracy of certain anomaly sub-classes, underscoring its practical value and efficacy.
arXiv Detail & Related papers (2023-06-26T06:45:16Z) - Anomaly Detection using Ensemble Classification and Evidence Theory [62.997667081978825]
We present a novel approach for novel detection using ensemble classification and evidence theory.
A pool selection strategy is presented to build a solid ensemble classifier.
We use uncertainty for the anomaly detection approach.
arXiv Detail & Related papers (2022-12-23T00:50:41Z) - Learning Label Modular Prompts for Text Classification in the Wild [56.66187728534808]
We propose text classification in-the-wild, which introduces different non-stationary training/testing stages.
Decomposing a complex task into modular components can enable robust generalisation under such non-stationary environment.
We propose MODULARPROMPT, a label-modular prompt tuning framework for text classification tasks.
arXiv Detail & Related papers (2022-11-30T16:26:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.