Foundation Models and Transformers for Anomaly Detection: A Survey
- URL: http://arxiv.org/abs/2507.15905v1
- Date: Mon, 21 Jul 2025 12:01:04 GMT
- Title: Foundation Models and Transformers for Anomaly Detection: A Survey
- Authors: Mouïn Ben Ammar, Arturo Mendoza, Nacim Belkhir, Antoine Manzanera, Gianni Franchi,
- Abstract summary: The survey categorizes VAD methods into reconstruction-based, feature-based and zero/few-shot approaches.<n>Transformers and foundation models enable more robust, interpretable, and scalable anomaly detection solutions.
- Score: 2.3264194695971656
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In line with the development of deep learning, this survey examines the transformative role of Transformers and foundation models in advancing visual anomaly detection (VAD). We explore how these architectures, with their global receptive fields and adaptability, address challenges such as long-range dependency modeling, contextual modeling and data scarcity. The survey categorizes VAD methods into reconstruction-based, feature-based and zero/few-shot approaches, highlighting the paradigm shift brought about by foundation models. By integrating attention mechanisms and leveraging large-scale pre-training, Transformers and foundation models enable more robust, interpretable, and scalable anomaly detection solutions. This work provides a comprehensive review of state-of-the-art techniques, their strengths, limitations, and emerging trends in leveraging these architectures for VAD.
Related papers
- Probing Information Distribution in Transformer Architectures through Entropy Analysis [39.58317527488534]
This work explores entropy analysis as a tool for probing information distribution within Transformer-based architectures.<n>We apply the methodology to a GPT-based large language model, illustrating its potential to reveal insights into model behavior and internal representations.
arXiv Detail & Related papers (2025-07-21T08:01:22Z) - Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation [75.30238170051291]
Depth estimation is a fundamental task in 3D computer vision, crucial for applications such as 3D reconstruction, free-viewpoint rendering, robotics, autonomous driving, and AR/VR technologies.<n>Traditional methods relying on hardware sensors like LiDAR are often limited by high costs, low resolution, and environmental sensitivity, limiting their applicability in real-world scenarios.<n>Recent advances in vision-based methods offer a promising alternative, yet they face challenges in generalization and stability due to either the low-capacity model architectures or the reliance on domain-specific and small-scale datasets.
arXiv Detail & Related papers (2025-07-15T17:59:59Z) - Anomaly Detection and Generation with Diffusion Models: A Survey [51.61574868316922]
Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing.<n>Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest.<n>This survey aims to guide researchers and practitioners in leveraging DMs for innovative AD solutions across diverse applications.
arXiv Detail & Related papers (2025-06-11T03:29:18Z) - Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction [55.914891182214475]
We introduce neural network reprogrammability as a unifying framework for model adaptation.<n>We present a taxonomy that categorizes such information manipulation approaches across four key dimensions.<n>We also analyze remaining technical challenges and ethical considerations.
arXiv Detail & Related papers (2025-06-05T05:42:27Z) - A Survey of Model Architectures in Information Retrieval [64.75808744228067]
We focus on two key aspects: backbone models for feature extraction and end-to-end system architectures for relevance estimation.<n>We trace the development from traditional term-based methods to modern neural approaches, particularly highlighting the impact of transformer-based models and subsequent large language models (LLMs)<n>We conclude by discussing emerging challenges and future directions, including architectural optimizations for performance and scalability, handling of multimodal, multilingual data, and adaptation to novel application domains beyond traditional search paradigms.
arXiv Detail & Related papers (2025-02-20T18:42:58Z) - Object Style Diffusion for Generalized Object Detection in Urban Scene [69.04189353993907]
We introduce a novel single-domain object detection generalization method, named GoDiff.<n>By integrating pseudo-target domain data with source domain data, we diversify the training dataset.<n> Experimental results demonstrate that our method not only enhances the generalization ability of existing detectors but also functions as a plug-and-play enhancement for other single-domain generalization methods.
arXiv Detail & Related papers (2024-12-18T13:03:00Z) - Predictive Modeling, Pattern Recognition, and Spatiotemporal Representations of Plant Growth in Simulated and Controlled Environments: A Comprehensive Review [0.0]
This review explores state-of-the-art predictive pattern recognition techniques.<n>We focus on the probabilistic modeling of plant traits and the integration of dynamic environmental interactions.<n>Key topics include regressions and neural network-based representation models for the task of forecasting.
arXiv Detail & Related papers (2024-12-13T20:22:35Z) - A Survey on All-in-One Image Restoration: Taxonomy, Evaluation and Future Trends [67.43992456058541]
Image restoration (IR) aims to recover high-quality images from inputs degraded by various factors such as noise, blur, compression, and adverse weather.<n>Traditional IR methods typically focus on specific types of degradation, which limits their effectiveness in real-world scenarios with complex distortions.<n>The all-in-one image restoration paradigm has recently emerged, offering a unified framework that adeptly addresses multiple degradation types.
arXiv Detail & Related papers (2024-10-19T11:11:09Z) - Vision Foundation Models in Remote Sensing: A Survey [6.036426846159163]
Foundation models are large-scale, pre-trained AI models capable of performing a wide array of tasks with unprecedented accuracy and efficiency.<n>This survey aims to serve as a resource for researchers and practitioners by providing a panorama of advances and promising pathways for continued development and application of foundation models in remote sensing.
arXiv Detail & Related papers (2024-08-06T22:39:34Z) - Diffusion Models for Image Restoration and Enhancement -- A
Comprehensive Survey [96.99328714941657]
We present a comprehensive review of recent diffusion model-based methods on image restoration.
We classify and emphasize the innovative designs using diffusion models for both IR and blind/real-world IR.
We propose five potential and challenging directions for the future research of diffusion model-based IR.
arXiv Detail & Related papers (2023-08-18T08:40:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.