MAGE-ID: A Multimodal Generative Framework for Intrusion Detection Systems
- URL: http://arxiv.org/abs/2512.03375v1
- Date: Wed, 03 Dec 2025 02:22:21 GMT
- Title: MAGE-ID: A Multimodal Generative Framework for Intrusion Detection Systems
- Authors: Mahdi Arab Loodaricheh, Mohammad Hossein Manshaei, Anita Raja,
- Abstract summary: This paper introduces MAGE-ID (Multimodal Attack Generator for Intrusion Detection), a diffusion-based generative framework.<n>By jointly training Transformer and CNN-based variational encoders with an EDM style denoiser, MAGE-ID achieves balanced and coherent multimodal synthesis.
- Score: 0.7774159184252745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern Intrusion Detection Systems (IDS) face severe challenges due to heterogeneous network traffic, evolving cyber threats, and pronounced data imbalance between benign and attack flows. While generative models have shown promise in data augmentation, existing approaches are limited to single modalities and fail to capture cross-domain dependencies. This paper introduces MAGE-ID (Multimodal Attack Generator for Intrusion Detection), a diffusion-based generative framework that couples tabular flow features with their transformed images through a unified latent prior. By jointly training Transformer and CNN-based variational encoders with an EDM style denoiser, MAGE-ID achieves balanced and coherent multimodal synthesis. Evaluations on CIC-IDS-2017 and NSL-KDD demonstrate significant improvements in fidelity, diversity, and downstream detection performance over TabSyn and TabDDPM, highlighting the effectiveness of MAGE-ID for multimodal IDS augmentation.
Related papers
- Multi-Agent Collaborative Intrusion Detection for Low-Altitude Economy IoT: An LLM-Enhanced Agentic AI Framework [60.72591149679355]
The rapid expansion of low-altitude economy Internet of Things (LAE-IoT) networks has created unprecedented security challenges.<n>Traditional intrusion detection systems fail to tackle the unique characteristics of aerial IoT environments.<n>We introduce a large language model (LLM)-enabled agentic AI framework for enhancing intrusion detection in LAE-IoT networks.
arXiv Detail & Related papers (2026-01-25T12:47:25Z) - Latent Diffusion for Internet of Things Attack Data Generation in Intrusion Detection [2.2758077237273837]
Intrusion Detection Systems (IDSs) are a key component for protecting Internet of Things (IoT) environments.<n>In Machine Learning-based (ML-based) IDSs, performance is often degraded by the strong class imbalance between benign and attack traffic.<n>We propose the use of a Latent Diffusion Model (LDM) for attack data augmentation in IoT intrusion detection.
arXiv Detail & Related papers (2026-01-23T18:55:07Z) - Bridging the Discrete-Continuous Gap: Unified Multimodal Generation via Coupled Manifold Discrete Absorbing Diffusion [60.186310080523135]
Bifurcation of generative modeling into autoregressive approaches for discrete data (text) and diffusion approaches for continuous data (images) hinders development of truly unified multimodal systems.<n>We propose textbfCoM-DAD, a novel probabilistic framework that reformulates multimodal generation as a hierarchical dual-process.<n>Our method demonstrates superior stability over standard masked modeling, establishing a new paradigm for scalable, unified text-image generation.
arXiv Detail & Related papers (2026-01-07T16:21:19Z) - MambaITD: An Efficient Cross-Modal Mamba Network for Insider Threat Detection [9.049925971684837]
This paper proposes a new insider threat detection framework MambaITD based on the Mamba state space model and cross-modal adaptive fusion.<n>Compared with traditional methods, MambaITD shows significant advantages in modeling efficiency and feature fusion capabilities.
arXiv Detail & Related papers (2025-08-06T18:45:00Z) - Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling [48.96034602889216]
Variencoding Discrete Diffusion (VADD) is a novel framework that enhances discrete diffusion with latent variable modeling.<n>By introducing an auxiliary recognition model, VADD enables stable training via variational lower bounds and amortized inference over the training set.<n> Empirical results on 2D toy data, pixel-level image generation, and text generation demonstrate that VADD consistently outperforms MDM baselines.
arXiv Detail & Related papers (2025-05-23T01:45:47Z) - ITCFN: Incomplete Triple-Modal Co-Attention Fusion Network for Mild Cognitive Impairment Conversion Prediction [12.893857146169045]
Alzheimer's disease (AD) is a common neurodegenerative disease among the elderly.<n>Early prediction and timely intervention of its prodromal stage, mild cognitive impairment (MCI), can decrease the risk of advancing to AD.
arXiv Detail & Related papers (2025-01-20T05:12:31Z) - MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling [64.09238330331195]
We propose a novel Multi-Modal Auto-Regressive (MMAR) probabilistic modeling framework.<n>Unlike discretization line of method, MMAR takes in continuous-valued image tokens to avoid information loss in an efficient way.<n>We also propose a theoretically proven technique that addresses the numerical stability issue and a training strategy that balances the generation and understanding task goals.
arXiv Detail & Related papers (2024-10-14T17:57:18Z) - Multi-scale Bottleneck Transformer for Weakly Supervised Multimodal Violence Detection [9.145305176998447]
Weakly supervised multimodal violence detection aims to learn a violence detection model by leveraging multiple modalities.
We propose a new weakly supervised MVD method that explicitly addresses the challenges of information redundancy, modality imbalance, and modality asynchrony.
Experiments on the largest-scale XD-Violence dataset demonstrate that the proposed method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-05-08T15:27:08Z) - MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - MTS-DVGAN: Anomaly Detection in Cyber-Physical Systems using a Dual
Variational Generative Adversarial Network [7.889342625283858]
Deep generative models are promising in detecting novel cyber-physical attacks, mitigating the vulnerability of Cyber-physical systems (CPSs) without relying on labeled information.
This article proposes a novel unsupervised dual variational generative adversarial model named MST-DVGAN.
The central concept is to enhance the model's discriminative capability by widening the distinction between reconstructed abnormal samples and their normal counterparts.
arXiv Detail & Related papers (2023-11-04T11:19:03Z) - Unified Discrete Diffusion for Simultaneous Vision-Language Generation [78.21352271140472]
We present a unified multimodal generation model that can conduct both the "modality translation" and "multi-modality generation" tasks.
Specifically, we unify the discrete diffusion process for multimodal signals by proposing a unified transition matrix.
Our proposed method can perform comparably to the state-of-the-art solutions in various generation tasks.
arXiv Detail & Related papers (2022-11-27T14:46:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.