Causal-HM: Restoring Physical Generative Logic in Multimodal Anomaly Detection via Hierarchical Modulation
- URL: http://arxiv.org/abs/2512.21650v1
- Date: Thu, 25 Dec 2025 12:32:33 GMT
- Title: Causal-HM: Restoring Physical Generative Logic in Multimodal Anomaly Detection via Hierarchical Modulation
- Authors: Xiao Liu, Junchen Jin, Yanjie Zhao, Zhixuan Xing,
- Abstract summary: Causal-HM is a unified multimodal UAD framework that explicitly models the Process to Result dependency.<n>Our framework incorporates two key innovations: a Sensor-Guided CHM Modulation mechanism that utilizes low-dimensional sensor signals as context to guide high-dimensional audio-visual feature extraction.<n>Experiments on our newly constructed Weld-4M benchmark across four modalities demonstrate that Causal-HM achieves a state-of-the-art (SOTA) I-AUROC of 90.7%.
- Score: 7.284641019396717
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal Unsupervised Anomaly Detection (UAD) is critical for quality assurance in smart manufacturing, particularly in complex processes like robotic welding. However, existing methods often suffer from causal blindness, treating process modalities (e.g., real-time video, audio, and sensors) and result modalities (e.g., post-weld images) as equal feature sources, thereby ignoring the inherent physical generative logic. Furthermore, the heterogeneity gap between high-dimensional visual data and low-dimensional sensor signals frequently leads to critical process context being drowned out. In this paper, we propose Causal-HM, a unified multimodal UAD framework that explicitly models the physical Process to Result dependency. Specifically, our framework incorporates two key innovations: a Sensor-Guided CHM Modulation mechanism that utilizes low-dimensional sensor signals as context to guide high-dimensional audio-visual feature extraction , and a Causal-Hierarchical Architecture that enforces a unidirectional generative mapping to identify anomalies that violate physical consistency. Extensive experiments on our newly constructed Weld-4M benchmark across four modalities demonstrate that Causal-HM achieves a state-of-the-art (SOTA) I-AUROC of 90.7%. Code will be released after the paper is accepted.
Related papers
- Reason-IAD: Knowledge-Guided Dynamic Latent Reasoning for Explainable Industrial Anomaly Detection [85.29900916231655]
Reason-IAD is a knowledge-guided dynamic latent reasoning framework for explainable industrial anomaly detection.<n>Experiments demonstrate that Reason-IAD consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2026-02-10T14:54:17Z) - Cheap2Rich: A Multi-Fidelity Framework for Data Assimilation and System Identification of Multiscale Physics -- Rotating Detonation Engines [1.8796659304823702]
Cheap2Rich is a multi-scale data assimilation framework that reconstructs high-fidelity state spaces from sparse sensor histories.<n>We demonstrate the performance on rotating detonation engines (RDEs)<n>Results highlight a general multi-fidelity framework for data assimilation and system identification in complex multi-scale systems.
arXiv Detail & Related papers (2026-01-28T06:35:22Z) - RareFlow: Physics-Aware Flow-Matching for Cross-Sensor Super-Resolution of Rare-Earth Features [27.505614464585538]
We present RareFlow, a physics-aware SR framework designed for out-of-distribution (OOD) robustness.<n>A Gated ControlNet preserves fine-grained geometric fidelity from the low-resolution input, while textual prompts provide semantic guidance for synthesizing complex features.<n>In blind evaluations, geophysical experts rated our model's outputs as approaching the fidelity of ground truth imagery, significantly outperforming state-of-the-art baselines.
arXiv Detail & Related papers (2025-10-27T19:56:43Z) - ShortcutBreaker: Low-Rank Noisy Bottleneck with Global Perturbation Attention for Multi-Class Unsupervised Anomaly Detection [59.89803740308262]
ShortcutBreaker is a novel unified feature-reconstruction framework for MUAD tasks.<n>It features two key innovations to address the issue of shortcuts.<n>The proposed method achieves a remarkable image-level AUROC of 99.8%, 98.9%, 90.6%, and 87.8% on four datasets.
arXiv Detail & Related papers (2025-10-21T06:51:30Z) - Quality-Aware Language-Conditioned Local Auto-Regressive Anomaly Synthesis and Detection [30.77558600436759]
ARAS is a language-conditioned, auto-regressive anomaly synthesis approach.<n>It injects local, text-specified defects into normal images via token-anchored latent editing.<n>It significantly enhances defect realism, preserves fine-grained material textures, and provides continuous semantic control over synthesized anomalies.
arXiv Detail & Related papers (2025-08-05T15:07:32Z) - FreSca: Scaling in Frequency Space Enhances Diffusion Models [55.75504192166779]
This paper explores frequency-based control within latent diffusion models.<n>We introduce FreSca, a novel framework that decomposes noise difference into low- and high-frequency components.<n>FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control.
arXiv Detail & Related papers (2025-04-02T22:03:11Z) - FgC2F-UDiff: Frequency-guided and Coarse-to-fine Unified Diffusion Model for Multi-modality Missing MRI Synthesis [6.475175425060296]
We propose a novel unified synthesis model, the Frequency-guided and Coarse-to-fine Unified Diffusion Model (FgC2F-UDiff)
arXiv Detail & Related papers (2025-01-07T04:42:45Z) - Cross-Modal Learning for Anomaly Detection in Complex Industrial Process: Methodology and Benchmark [19.376814754500625]
Anomaly detection in complex industrial processes plays a pivotal role in ensuring efficient, stable, and secure operation.
This paper proposes a cross-modal Transformer to facilitate anomaly detection by exploring the correlation between visual features (video) and process variables (current) in the context of the fused magnesium smelting process.
We present a pioneering cross-modal benchmark of the fused magnesium smelting process, featuring synchronously acquired video and current data for over 2.2 million samples.
arXiv Detail & Related papers (2024-06-13T11:40:06Z) - DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection [52.74152717667157]
We propose a lightweight module called Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in-temporal skeletal data.
It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops.
arXiv Detail & Related papers (2024-06-05T06:18:03Z) - On Sensitivity and Robustness of Normalization Schemes to Input
Distribution Shifts in Automatic MR Image Diagnosis [58.634791552376235]
Deep Learning (DL) models have achieved state-of-the-art performance in diagnosing multiple diseases using reconstructed images as input.
DL models are sensitive to varying artifacts as it leads to changes in the input data distribution between the training and testing phases.
We propose to use other normalization techniques, such as Group Normalization and Layer Normalization, to inject robustness into model performance against varying image artifacts.
arXiv Detail & Related papers (2023-06-23T03:09:03Z) - Multimodal Industrial Anomaly Detection via Hybrid Fusion [59.16333340582885]
We propose a novel multimodal anomaly detection method with hybrid fusion scheme.
Our model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTecD-3 AD dataset.
arXiv Detail & Related papers (2023-03-01T15:48:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.