Related papers: MoECLIP: Patch-Specialized Experts for Zero-shot Anomaly Detection

MoECLIP: Patch-Specialized Experts for Zero-shot Anomaly Detection

URL: http://arxiv.org/abs/2603.03101v2
Date: Wed, 04 Mar 2026 03:55:36 GMT
Title: MoECLIP: Patch-Specialized Experts for Zero-shot Anomaly Detection
Authors: Jun Yeong Park, JunYoung Seo, Minji Kang, Yu Rang Park,
Abstract summary: MoECLIP is a Mixture-of-Experts architecture for the Zero-Shot Anomaly Detection (ZSAD) task.<n>It achieves patch-level adaptation by dynamically routing each image patch to a specialized Low-Rank Adaptation (LoRA) expert based on its unique characteristics.
Score: 6.6626674107399495
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The CLIP model's outstanding generalization has driven recent success in Zero-Shot Anomaly Detection (ZSAD) for detecting anomalies in unseen categories. The core challenge in ZSAD is to specialize the model for anomaly detection tasks while preserving CLIP's powerful generalization capability. Existing approaches attempting to solve this challenge share the fundamental limitation of a patch-agnostic design that processes all patches monolithically without regard for their unique characteristics. To address this limitation, we propose MoECLIP, a Mixture-of-Experts (MoE) architecture for the ZSAD task, which achieves patch-level adaptation by dynamically routing each image patch to a specialized Low-Rank Adaptation (LoRA) expert based on its unique characteristics. Furthermore, to prevent functional redundancy among the LoRA experts, we introduce (1) Frozen Orthogonal Feature Separation (FOFS), which orthogonally separates the input feature space to force experts to focus on distinct information, and (2) a simplex equiangular tight frame (ETF) loss to regulate the expert outputs to form maximally equiangular representations. Comprehensive experimental results across 14 benchmark datasets spanning industrial and medical domains demonstrate that MoECLIP outperforms existing state-of-the-art methods. The code is available at https://github.com/CoCoRessa/MoECLIP.

Related papers

Contrastive Spectral Rectification: Test-Time Defense towards Zero-shot Adversarial Robustness of CLIP [68.44229678548298]
Contrastive Spectral Rectification (CSR) is an efficient test-time defense against adversarial examples.<n>CSR outperforms the SOTA by an average of 18.1% against strong AutoAttack.<n>CSR exhibits broad applicability across diverse visual tasks.
arXiv Detail & Related papers (2026-01-27T05:24:45Z)
Mixture of Ranks with Degradation-Aware Routing for One-Step Real-World Image Super-Resolution [76.66229730098759]
In real-world image super-resolution (Real-ISR), existing approaches mainly rely on fine-tuning pre-trained diffusion models.<n>We propose a Mixture-of-Ranks (MoR) architecture for single-step image super-resolution.<n>We introduce a fine-grained expert partitioning strategy that treats each rank in LoRA as an independent expert.
arXiv Detail & Related papers (2025-11-20T04:11:44Z)
Beyond Redundancy: Diverse and Specialized Multi-Expert Sparse Autoencoder [59.89996751196727]
Sparse autoencoders (SAEs) have emerged as a powerful tool for interpreting large language models.<n>SAEs' hidden layers have high dimensionality to satisfy sparsity constraints, resulting in prohibitive training and inference costs.<n>Recent Mixture of Experts (MoE) approaches attempt to address this by SAEs into narrower expert networks with gated activation.<n>We propose two key innovations: (1) Multiple Expert Activation that simultaneously engages semantically weighted expert subsets to encourage specialization, and (2) Feature Scaling that enhances diversity through adaptive high-frequency scaling.
arXiv Detail & Related papers (2025-11-07T22:19:34Z)
Hierarchical Self-Supervised Representation Learning for Depression Detection from Speech [51.14752758616364]
Speech-based depression detection (SDD) is a promising, non-invasive alternative to traditional clinical assessments.<n>We propose HAREN-CTC, a novel architecture that integrates multi-layer SSL features using cross-attention within a multitask learning framework.<n>The model achieves state-of-the-art macro F1-scores of 0.81 on DAIC-WOZ and 0.82 on MODMA, outperforming prior methods across both evaluation scenarios.
arXiv Detail & Related papers (2025-10-05T09:32:12Z)
LEAF: A Robust Expert-Based Framework for Few-Shot Continual Event Detection [7.094483187879095]
LEAF is a novel and robust expert-based framework for Continual Event Detection.<n>It incorporates a specialized mixture of experts architecture into the base model, where each expert is parameterized with low-rank adaptation (LoRA) matrices.<n>A semantic-aware expert selection mechanism dynamically routes instances to the most relevant experts, enabling expert specialization and reducing knowledge interference.
arXiv Detail & Related papers (2025-09-29T10:00:25Z)
Mixture of Experts Guided by Gaussian Splatters Matters: A new Approach to Weakly-Supervised Video Anomaly Detection [7.435598538875321]
Video Anomaly Detection (VAD) is a challenging task due to the variability of anomalous events and the limited availability of labeled data.<n>We propose a novel framework that employs a set of expert models, each specialized in capturing specific anomaly types.<n>Our approach achieves state-of-the-art performance, with a 91.58% AUC on the UCF-Crime dataset, and demonstrates superior results on XD-Violence and MSAD datasets.
arXiv Detail & Related papers (2025-08-08T13:48:48Z)
AnomalyMoE: Towards a Language-free Generalist Model for Unified Visual Anomaly Detection [29.06542941993374]
AnomalyMoE is a novel and universal anomaly detection framework based on a Mixture-of-Experts architecture.<n>Our key insight is to decompose the complex anomaly detection problem into three distinct semantic hierarchies.<n>AnomalyMoE employs three dedicated expert networks at the patch, component, and global levels.
arXiv Detail & Related papers (2025-08-08T10:33:18Z)
Bridge Feature Matching and Cross-Modal Alignment with Mutual-filtering for Zero-shot Anomaly Detection [25.349261412750586]
This study introduces textbfFiSeCLIP for ZSAD with training-free textbfCLIP, combining the feature matching with the cross-modal alignment.<n>Our approach exhibits superior performance for both anomaly classification and segmentation on anomaly detection benchmarks.
arXiv Detail & Related papers (2025-07-15T05:42:17Z)
Towards Generalized Range-View LiDAR Segmentation in Adverse Weather [65.22588361803942]
We identify and analyze the unique challenges that affect the generalization of range-view LiDAR segmentation in severe weather.<n>We propose a modular and lightweight framework that enhances robustness without altering the core architecture of existing models.<n>Our approach significantly improves generalization to adverse weather with minimal inference overhead.
arXiv Detail & Related papers (2025-06-10T16:48:27Z)
AFR-CLIP: Enhancing Zero-Shot Industrial Anomaly Detection with Stateless-to-Stateful Anomaly Feature Rectification [11.844008592270555]
We propose AFR-CLIP, a CLIP-based anomaly feature rectification framework.<n>It generates anomaly maps by measuring the cosine similarity between visual and textual features.<n>Experiments are conducted on eleven anomaly detection benchmarks across industrial and medical domains.
arXiv Detail & Related papers (2025-03-17T08:18:55Z)
Complexity Experts are Task-Discriminative Learners for Any Image Restoration [80.46313715427928]
We introduce complexity experts" -- flexible expert blocks with varying computational complexity and receptive fields.<n>This preference effectively drives task-specific allocation, assigning tasks to experts with the appropriate complexity.<n>The proposed MoCE-IR model outperforms state-of-the-art methods, affirming its efficiency and practical applicability.
arXiv Detail & Related papers (2024-11-27T15:58:07Z)
P-WAE: Generalized Patch-Wasserstein Autoencoder for Anomaly Screening [17.24628770042803]
We propose a novel Patch-wise Wasserstein AutoEncoder (P-WAE) architecture to alleviate those challenges. In particular, a patch-wise variational inference model coupled with solving the jigsaw puzzle is designed. Comprehensive experiments, conducted on the MVTec AD dataset, demonstrate the superior performance of our propo
arXiv Detail & Related papers (2021-08-09T05:31:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.