Related papers: Cross-Modal Learning for Anomaly Detection in Complex Industrial Process: Methodology and Benchmark

Cross-Modal Learning for Anomaly Detection in Complex Industrial Process: Methodology and Benchmark

URL: http://arxiv.org/abs/2406.09016v2
Date: Sat, 02 Nov 2024 13:09:38 GMT
Title: Cross-Modal Learning for Anomaly Detection in Complex Industrial Process: Methodology and Benchmark
Authors: Gaochang Wu, Yapeng Zhang, Lan Deng, Jingxin Zhang, Tianyou Chai,
Abstract summary: Anomaly detection in complex industrial processes plays a pivotal role in ensuring efficient, stable, and secure operation. This paper proposes a cross-modal Transformer to facilitate anomaly detection by exploring the correlation between visual features (video) and process variables (current) in the context of the fused magnesium smelting process. We present a pioneering cross-modal benchmark of the fused magnesium smelting process, featuring synchronously acquired video and current data for over 2.2 million samples.
Score: 19.376814754500625
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Anomaly detection in complex industrial processes plays a pivotal role in ensuring efficient, stable, and secure operation. Existing anomaly detection methods primarily focus on analyzing dominant anomalies using the process variables (such as arc current) or constructing neural networks based on abnormal visual features, while overlooking the intrinsic correlation of cross-modal information. This paper proposes a cross-modal Transformer (dubbed FmFormer), designed to facilitate anomaly detection by exploring the correlation between visual features (video) and process variables (current) in the context of the fused magnesium smelting process. Our approach introduces a novel tokenization paradigm to effectively bridge the substantial dimensionality gap between the 3D video modality and the 1D current modality in a multiscale manner, enabling a hierarchical reconstruction of pixel-level anomaly detection. Subsequently, the FmFormer leverages self-attention to learn internal features within each modality and bidirectional cross-attention to capture correlations across modalities. By decoding the bidirectional correlation features, we obtain the final detection result and even locate the specific anomaly region. To validate the effectiveness of the proposed method, we also present a pioneering cross-modal benchmark of the fused magnesium smelting process, featuring synchronously acquired video and current data for over 2.2 million samples. Leveraging cross-modal learning, the proposed FmFormer achieves state-of-the-art performance in detecting anomalies, particularly under extreme interferences such as current fluctuations and visual occlusion caused by heavy water mist. The presented methodology and benchmark may be applicable to other industrial applications with some amendments. The benchmark will be released at https://github.com/GaochangWu/FMF-Benchmark.

Related papers

CLIP Meets Diffusion: A Synergistic Approach to Anomaly Detection [54.85000884785013]
Anomaly detection is a complex problem due to the ambiguity in defining anomalies, the diversity of anomaly types, and the scarcity of training data.<n>We propose CLIPfusion, a method that leverages both discriminative and generative foundation models.<n>We believe that our method underscores the effectiveness of multi-modal and multi-model fusion in tackling the multifaceted challenges of anomaly detection.
arXiv Detail & Related papers (2025-06-13T13:30:15Z)
Rethinking Contrastive Learning in Graph Anomaly Detection: A Clean-View Perspective [54.605073936695575]
Graph anomaly detection aims to identify unusual patterns in graph-based data, with wide applications in fields such as web security and financial fraud detection.<n>Existing methods rely on contrastive learning, assuming that a lower similarity between a node and its local subgraph indicates abnormality.<n>The presence of interfering edges invalidates this assumption, since it introduces disruptive noise that compromises the contrastive learning process.<n>We propose a Clean-View Enhanced Graph Anomaly Detection framework (CVGAD), which includes a multi-scale anomaly awareness module to identify key sources of interference in the contrastive learning process.
arXiv Detail & Related papers (2025-05-23T15:05:56Z)
Enhancing Web Service Anomaly Detection via Fine-grained Multi-modal Association and Frequency Domain Analysis [8.860339665670255]
Anomaly detection is crucial for ensuring the stability and reliability of web service systems. Existing anomaly detection methods use logs and metrics to detect anomalies. We propose a novel anomaly detection method named FFAD to address these two issues.
arXiv Detail & Related papers (2025-01-28T12:00:45Z)
Dual Conditioned Motion Diffusion for Pose-Based Video Anomaly Detection [12.100563798908777]
Video Anomaly Detection (VAD) is essential for computer vision research. Existing VAD methods utilize either reconstruction-based or prediction-based frameworks. We address pose-based video anomaly detection and introduce a novel framework called Dual Conditioned Motion Diffusion.
arXiv Detail & Related papers (2024-12-23T01:31:39Z)
SeaDATE: Remedy Dual-Attention Transformer with Semantic Alignment via Contrast Learning for Multimodal Object Detection [18.090706979440334]
Multimodal object detection leverages diverse modal information to enhance the accuracy and robustness of detectors. Current methods merely stack Transformer-guided fusion techniques without exploring their capability to extract features at various depth layers of network. In this paper, we introduce an accurate and efficient object detection method named SeaDATE.
arXiv Detail & Related papers (2024-10-15T07:26:39Z)
DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection [52.74152717667157]
We propose a lightweight module called Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in-temporal skeletal data. It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops.
arXiv Detail & Related papers (2024-06-05T06:18:03Z)
A Generalization Theory of Cross-Modality Distillation with Contrastive Learning [49.35244441141323]
Cross-modality distillation arises as an important topic for data modalities containing limited knowledge. We formulate a general framework of cross-modality contrastive distillation (CMCD), built upon contrastive learning. Our algorithm outperforms existing algorithms consistently by a margin of 2-3% across diverse modalities and tasks.
arXiv Detail & Related papers (2024-05-06T11:05:13Z)
Fully Differentiable Correlation-driven 2D/3D Registration for X-ray to CT Image Fusion [3.868072865207522]
Image-based rigid 2D/3D registration is a critical technique for fluoroscopic guided surgical interventions. We propose a novel fully differentiable correlation-driven network using a dual-branch CNN-transformer encoder. A correlation-driven loss is proposed for low-frequency feature and high-frequency feature decomposition based on embedded information.
arXiv Detail & Related papers (2024-02-04T14:12:51Z)
DiffVein: A Unified Diffusion Network for Finger Vein Segmentation and Authentication [50.017055360261665]
We introduce DiffVein, a unified diffusion model-based framework which simultaneously addresses vein segmentation and authentication tasks. For better feature interaction between these two branches, we introduce two specialized modules. In this way, our framework allows for a dynamic interplay between diffusion and segmentation embeddings.
arXiv Detail & Related papers (2024-02-03T06:49:42Z)
Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets. We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z)
CL-Flow:Strengthening the Normalizing Flows by Contrastive Learning for Better Anomaly Detection [1.951082473090397]
We propose a self-supervised anomaly detection approach that combines contrastive learning with 2D-Flow. Compared to mainstream unsupervised approaches, our self-supervised method demonstrates superior detection accuracy, fewer additional model parameters, and faster inference speed. Our approach showcases new state-of-the-art results, achieving a performance of 99.6% in image-level AUROC on the MVTecAD dataset and 96.8% in image-level AUROC on the BTAD dataset.
arXiv Detail & Related papers (2023-11-12T10:07:03Z)
ImDiffusion: Imputed Diffusion Models for Multivariate Time Series Anomaly Detection [44.21198064126152]
We propose a novel anomaly detection framework named ImDiffusion. ImDiffusion combines time series imputation and diffusion models to achieve accurate and robust anomaly detection. We evaluate the performance of ImDiffusion via extensive experiments on benchmark datasets.
arXiv Detail & Related papers (2023-07-03T04:57:40Z)
Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical Fusion for Multimodal Affect Recognition [69.32305810128994]
Incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition. We propose the Hierarchical Crossmodal Transformer with Dynamic Modality Gating (HCT-DMG), a lightweight incongruity-aware model. HCT-DMG: 1) outperforms previous multimodal models with a reduced size of approximately 0.8M parameters; 2) recognizes hard samples where incongruity makes affect recognition difficult; 3) mitigates the incongruity at the latent level in crossmodal attention.
arXiv Detail & Related papers (2023-05-23T01:24:15Z)
Multimodal Industrial Anomaly Detection via Hybrid Fusion [59.16333340582885]
We propose a novel multimodal anomaly detection method with hybrid fusion scheme. Our model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTecD-3 AD dataset.
arXiv Detail & Related papers (2023-03-01T15:48:27Z)
A Transfer Learning Framework for Anomaly Detection Using Model of Normality [2.9685635948299995]
Convolutional Neural Network (CNN) techniques have proven to be very useful in image-based anomaly detection applications. We introduce a transfer learning framework for anomaly detection based on similarity measure with a Model of Normality (MoN) We show that with the proposed threshold settings, a significant performance improvement can be achieved.
arXiv Detail & Related papers (2020-11-12T05:26:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.