Related papers: Towards an Incremental Unified Multimodal Anomaly Detection: Augmenting Multimodal Denoising From an Information Bottleneck Perspective

Towards an Incremental Unified Multimodal Anomaly Detection: Augmenting Multimodal Denoising From an Information Bottleneck Perspective

URL: http://arxiv.org/abs/2603.02629v1
Date: Tue, 03 Mar 2026 05:58:35 GMT
Title: Towards an Incremental Unified Multimodal Anomaly Detection: Augmenting Multimodal Denoising From an Information Bottleneck Perspective
Authors: Kaifang Long, Lianbo Ma, Jiaqi Liu, Liming Liu, Guoyang Xie,
Abstract summary: We introduce a novel denoising framework called IB-IUMAD, which exploits the complementary benefits of the Mamba decoder and information bottleneck fusion module.<n>A series of theoretical analyses and experiments on MVTec 3D-AD and Eyecandies datasets demonstrates the effectiveness and competitive performance of IB-IUMAD.
Score: 15.313681588364242
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The quest for incremental unified multimodal anomaly detection seeks to empower a single model with the ability to systematically detect anomalies across all categories and support incremental learning to accommodate emerging objects/categories. Central to this pursuit is resolving the catastrophic forgetting dilemma, which involves acquiring new knowledge while preserving prior learned knowledge. Despite some efforts to address this dilemma, a key oversight persists: ignoring the potential impact of spurious and redundant features on catastrophic forgetting. In this paper, we delve into the negative effect of spurious and redundant features on this dilemma in incremental unified frameworks, and reveal that under similar conditions, the multimodal framework developed by naive aggregation of unimodal architectures is more prone to forgetting. To address this issue, we introduce a novel denoising framework called IB-IUMAD, which exploits the complementary benefits of the Mamba decoder and information bottleneck fusion module: the former dedicated to disentangle inter-object feature coupling, preventing spurious feature interference between objects; the latter serves to filter out redundant features from the fused features, thus explicitly preserving discriminative information. A series of theoretical analyses and experiments on MVTec 3D-AD and Eyecandies datasets demonstrates the effectiveness and competitive performance of IB-IUMAD.

Related papers

ShortcutBreaker: Low-Rank Noisy Bottleneck with Global Perturbation Attention for Multi-Class Unsupervised Anomaly Detection [59.89803740308262]
ShortcutBreaker is a novel unified feature-reconstruction framework for MUAD tasks.<n>It features two key innovations to address the issue of shortcuts.<n>The proposed method achieves a remarkable image-level AUROC of 99.8%, 98.9%, 90.6%, and 87.8% on four datasets.
arXiv Detail & Related papers (2025-10-21T06:51:30Z)
DAMS:Dual-Branch Adaptive Multiscale Spatiotemporal Framework for Video Anomaly Detection [7.117824587276951]
This study offers a dual-path architecture called the Dual-Branch Adaptive Multiscale Stemporal Framework (DAMS), which is based on multilevel feature and decoupling fusion.<n>The main processing path integrates the Adaptive Multiscale Time Pyramid Network (AMTPN) with the Convolutional Block Attention Mechanism (CBAM)
arXiv Detail & Related papers (2025-07-28T08:42:00Z)
UGG-ReID: Uncertainty-Guided Graph Model for Multi-Modal Object Re-Identification [26.770271366177603]
We propose a robust approach named Uncertainty-Guided Graph model for multi-modal object ReID (UGG-ReID)<n>UGG-ReID is designed to mitigate noise interference and facilitate effective multi-modal fusion.<n> Experimental results show that the proposed method achieves excellent performance on all datasets.
arXiv Detail & Related papers (2025-07-07T03:41:08Z)
Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization [60.899082019130766]
We introduce a frame-level detection network (FDN) and a proposal refinement network (PRN) for audio temporal forgery detection and localization. FDN aims to mine informative inconsistency cues between real and fake frames to obtain discriminative features that are beneficial for roughly indicating forgery regions. PRN is responsible for predicting confidence scores and regression offsets to refine the coarse-grained proposals derived from the FDN.
arXiv Detail & Related papers (2024-07-23T15:07:52Z)
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks. Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment. We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z)
Occlusion-Aware Detection and Re-ID Calibrated Network for Multi-Object Tracking [38.36872739816151]
Occlusion-Aware Attention (OAA) module in the detector highlights the object features while suppressing the occluded background regions. OAA can serve as a modulator that enhances the detector for some potentially occluded objects. We design a Re-ID embedding matching block based on the optimal transport problem.
arXiv Detail & Related papers (2023-08-30T06:56:53Z)
Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning. CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z)
Learning Prompt-Enhanced Context Features for Weakly-Supervised Video Anomaly Detection [37.99031842449251]
Video anomaly detection under weak supervision presents significant challenges. We present a weakly supervised anomaly detection framework that focuses on efficient context modeling and enhanced semantic discriminability. Our approach significantly improves the detection accuracy of certain anomaly sub-classes, underscoring its practical value and efficacy.
arXiv Detail & Related papers (2023-06-26T06:45:16Z)
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition. We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)
Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations. Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
Unsupervised Discovery, Control, and Disentanglement of Semantic Attributes with Applications to Anomaly Detection [15.817227809141116]
We focus on unsupervised generative representations that discover latent factors controlling image semantic attributes. For (a), we propose a network architecture that exploits the combination of multiscale generative models with mutual information (MI) For (b), we derive an analytical result (Lemma 1) that brings clarity to two related but distinct concepts.
arXiv Detail & Related papers (2020-02-25T20:50:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.