Multi-Flow: Multi-View-Enriched Normalizing Flows for Industrial Anomaly Detection
- URL: http://arxiv.org/abs/2504.03306v1
- Date: Fri, 04 Apr 2025 09:32:01 GMT
- Title: Multi-Flow: Multi-View-Enriched Normalizing Flows for Industrial Anomaly Detection
- Authors: Mathis Kruse, Bodo Rosenhahn,
- Abstract summary: We propose Multi-Flow, a novel multi-view anomaly detection method.<n>It makes use of a novel multi-view architecture, whose exact likelihood estimation is enhanced by fusing information across different views.<n>We empirically validate it on the real-world multi-view data set Real-IAD and reach a new state-of-the-art.
- Score: 20.499874396491347
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With more well-performing anomaly detection methods proposed, many of the single-view tasks have been solved to a relatively good degree. However, real-world production scenarios often involve complex industrial products, whose properties may not be fully captured by one single image. While normalizing flow based approaches already work well in single-camera scenarios, they currently do not make use of the priors in multi-view data. We aim to bridge this gap by using these flow-based models as a strong foundation and propose Multi-Flow, a novel multi-view anomaly detection method. Multi-Flow makes use of a novel multi-view architecture, whose exact likelihood estimation is enhanced by fusing information across different views. For this, we propose a new cross-view message-passing scheme, letting information flow between neighboring views. We empirically validate it on the real-world multi-view data set Real-IAD and reach a new state-of-the-art, surpassing current baselines in both image-wise and sample-wise anomaly detection tasks.
Related papers
- Learning Multi-view Multi-class Anomaly Detection [10.199404082194947]
We introduce a Multi-View Multi-Class Anomaly Detection model (MVMCAD), which integrates information from multiple views to accurately identify anomalies.
Specifically, we propose a semi-frozen encoder, where a pre-encoder prior enhancement mechanism is added before the frozen encoder.
An Anomaly Amplification Module (AAM) that models global token interactions and suppresses normal regions, and a Cross-Feature Loss that aligns shallow encoder features with deep decoder features.
arXiv Detail & Related papers (2025-04-30T03:59:58Z) - Uncertainty-Aware Global-View Reconstruction for Multi-View Multi-Label Feature Selection [4.176139684578661]
We propose a unified model constructed from the perspective of global-view reconstruction.<n>We incorporate the perception of sample uncertainty during the reconstruction process to enhance trustworthiness.<n> Experimental results demonstrate the superior performance of our method on multi-view datasets.
arXiv Detail & Related papers (2025-03-18T08:35:39Z) - Multi-View Industrial Anomaly Detection with Epipolar Constrained Cross-View Fusion [15.819291772583393]
We introduce an epipolar geometry-constrained attention module to guide cross-view fusion.<n>To further enhance the potential of cross-view attention, we propose a pretraining strategy inspired by memory bank-based anomaly detection.<n>We demonstrate that our framework outperforms existing methods on the state-of-the-art multi-view anomaly detection dataset.
arXiv Detail & Related papers (2025-03-14T05:02:54Z) - Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation [61.64052577026623]
Real-world multi-view datasets are often heterogeneous and imperfect.<n>We propose a novel robust MVL method (namely RML) with simultaneous representation fusion and alignment.<n>In experiments, we employ it in unsupervised multi-view clustering, noise-label classification, and as a plug-and-play module for cross-modal hashing retrieval.
arXiv Detail & Related papers (2025-03-06T07:01:08Z) - A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding [76.44979557843367]
We propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior.<n>We introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information.<n>We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image.
arXiv Detail & Related papers (2024-11-04T08:50:16Z) - GM-DF: Generalized Multi-Scenario Deepfake Detection [49.072106087564144]
Existing face forgery detection usually follows the paradigm of training models in a single domain.
In this paper, we elaborately investigate the generalization capacity of deepfake detection models when jointly trained on multiple face forgery detection datasets.
arXiv Detail & Related papers (2024-06-28T17:42:08Z) - Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - Two-level Data Augmentation for Calibrated Multi-view Detection [51.5746691103591]
We introduce a new multi-view data augmentation pipeline that preserves alignment among views.
We also propose a second level of augmentation applied directly at the scene level.
When combined with our simple multi-view detection model, our two-level augmentation pipeline outperforms all existing baselines.
arXiv Detail & Related papers (2022-10-19T17:55:13Z) - TSK Fuzzy System Towards Few Labeled Incomplete Multi-View Data
Classification [24.01191516774655]
A transductive semi-supervised incomplete multi-view TSK fuzzy system modeling method (SSIMV_TSK) is proposed to address these challenges.
The proposed method integrates missing view imputation, pseudo label learning of unlabeled data, and fuzzy system modeling into a single process to yield a model with interpretable fuzzy rules.
Experimental results on real datasets show that the proposed method significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2021-10-08T11:41:06Z) - Exploring Data Augmentation for Multi-Modality 3D Object Detection [82.9988604088494]
It is counter-intuitive that multi-modality methods based on point cloud and images perform only marginally better or sometimes worse than approaches that solely use point cloud.
We propose a pipeline, named transformation flow, to bridge the gap between single and multi-modality data augmentation with transformation reversing and replaying.
Our method also wins the best PKL award in the 3rd nuScenes detection challenge.
arXiv Detail & Related papers (2020-12-23T15:23:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.