Multi-Agent Framework for Threat Mitigation and Resilience in AI-Based Systems
- URL: http://arxiv.org/abs/2512.23132v1
- Date: Mon, 29 Dec 2025 01:27:19 GMT
- Title: Multi-Agent Framework for Threat Mitigation and Resilience in AI-Based Systems
- Authors: Armstrong Foundjem, Lionel Nganyewou Tidjon, Leuson Da Silva, Foutse Khomh,
- Abstract summary: Traditional cybersecurity lacks ML-specific threat modeling for foundation, multimodal, and RAG systems.<n>We extract 93 threats from MITRE ATLAS, AI Incident Database, and literature, and analyze 854 GitHub/Python repositories.<n>We identify unreported threats including commercial LLM API model stealing, parameter memorization leakage, and preference-guided text-only jailbreaks.
- Score: 6.850914615021645
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning (ML) underpins foundation models in finance, healthcare, and critical infrastructure, making them targets for data poisoning, model extraction, prompt injection, automated jailbreaking, and preference-guided black-box attacks that exploit model comparisons. Larger models can be more vulnerable to introspection-driven jailbreaks and cross-modal manipulation. Traditional cybersecurity lacks ML-specific threat modeling for foundation, multimodal, and RAG systems. Objective: Characterize ML security risks by identifying dominant TTPs, vulnerabilities, and targeted lifecycle stages. Methods: We extract 93 threats from MITRE ATLAS (26), AI Incident Database (12), and literature (55), and analyze 854 GitHub/Python repositories. A multi-agent RAG system (ChatGPT-4o, temp 0.4) mines 300+ articles to build an ontology-driven threat graph linking TTPs, vulnerabilities, and stages. Results: We identify unreported threats including commercial LLM API model stealing, parameter memorization leakage, and preference-guided text-only jailbreaks. Dominant TTPs include MASTERKEY-style jailbreaking, federated poisoning, diffusion backdoors, and preference optimization leakage, mainly impacting pre-training and inference. Graph analysis reveals dense vulnerability clusters in libraries with poor patch propagation. Conclusion: Adaptive, ML-specific security frameworks, combining dependency hygiene, threat intelligence, and monitoring, are essential to mitigate supply-chain and inference risks across the ML lifecycle.
Related papers
- Odysseus: Jailbreaking Commercial Multimodal LLM-integrated Systems via Dual Steganography [77.44136793431893]
We propose a novel jailbreak paradigm that introduces dual steganography to covertly embed malicious queries into benign-looking images.<n>Our Odysseus successfully jailbreaks several pioneering and realistic MLLM-integrated systems, achieving up to 99% attack success rate.
arXiv Detail & Related papers (2025-12-23T08:53:36Z) - The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search [58.8834056209347]
Large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs.<n>We introduce the Correlated Knowledge Attack Agent (CKA-Agent), a dynamic framework that reframes jailbreaking as an adaptive, tree-structured exploration of the target model's knowledge base.
arXiv Detail & Related papers (2025-12-01T07:05:23Z) - Mind Your Server: A Systematic Study of Parasitic Toolchain Attacks on the MCP Ecosystem [13.95558554298296]
Large language models (LLMs) are increasingly integrated with external systems through the Model Context Protocol (MCP)<n>In this paper, we reveal a new class of attacks, Parasitic Toolchain Attacks, instantiated as MCP Unintended Privacy Disclosure (MCP-UPD)<n>The malicious logic infiltrates the toolchain and unfolds in three phases: Parasitic Ingestion, Privacy Collection, and Privacy Disclosure, culminating in stealthy exfiltration of private data.
arXiv Detail & Related papers (2025-09-08T11:35:32Z) - ATAG: AI-Agent Application Threat Assessment with Attack Graphs [23.757154032523093]
This paper introduces AI-agent application Threat assessment with Attack Graphs (ATAG)<n>ATAG is a novel framework designed to systematically analyze the security risks associated with AI-agent applications.<n>It facilitates proactive identification and mitigation of AI-agent threats in multi-agent applications.
arXiv Detail & Related papers (2025-06-03T13:25:40Z) - Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models [83.80177564873094]
We propose a unified multimodal universal jailbreak attack framework.<n>We evaluate the undesirable context generation of MLLMs like LLaVA, Yi-VL, MiniGPT4, MiniGPT-v2, and InstructBLIP.<n>This study underscores the urgent need for robust safety measures in MLLMs.
arXiv Detail & Related papers (2025-06-02T04:33:56Z) - SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models [50.34706204154244]
Acquiring reasoning capabilities catastrophically degrades inherited safety alignment.<n>Certain scenarios suffer 25 times higher attack rates.<n>Despite tight reasoning-answer safety coupling, MLRMs demonstrate nascent self-correction.
arXiv Detail & Related papers (2025-04-09T06:53:23Z) - MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks [85.3303135160762]
MIRAGE is a novel framework that exploits narrative-driven context and role immersion to circumvent safety mechanisms in Multimodal Large Language Models.<n>It achieves state-of-the-art performance, improving attack success rates by up to 17.5% over the best baselines.<n>We demonstrate that role immersion and structured semantic reconstruction can activate inherent model biases, facilitating the model's spontaneous violation of ethical safeguards.
arXiv Detail & Related papers (2025-03-24T20:38:42Z) - Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks [88.84977282952602]
A high volume of recent ML security literature focuses on attacks against aligned large language models (LLMs)<n>In this paper, we analyze security and privacy vulnerabilities that are unique to LLM agents.<n>We conduct a series of illustrative attacks on popular open-source and commercial agents, demonstrating the immediate practical implications of their vulnerabilities.
arXiv Detail & Related papers (2025-02-12T17:19:36Z) - Cross-modality Information Check for Detecting Jailbreaking in Multimodal Large Language Models [17.663550432103534]
Multimodal Large Language Models (MLLMs) extend the capacity of LLMs to understand multimodal information comprehensively.
These models are susceptible to jailbreak attacks, where malicious users can break the safety alignment of the target model and generate misleading and harmful answers.
We propose Cross-modality Information DEtectoR (CIDER), a plug-and-play jailbreaking detector designed to identify maliciously perturbed image inputs.
arXiv Detail & Related papers (2024-07-31T15:02:46Z) - Fine-Tuning, Quantization, and LLMs: Navigating Unintended Outcomes [0.0]
Large Language Models (LLMs) have gained widespread adoption across various domains, including chatbots and auto-task completion agents.
These models are susceptible to safety vulnerabilities such as jailbreaking, prompt injection, and privacy leakage attacks.
This study investigates the impact of these modifications on LLM safety, a critical consideration for building reliable and secure AI systems.
arXiv Detail & Related papers (2024-04-05T20:31:45Z) - Threat Assessment in Machine Learning based Systems [12.031113181911627]
We conduct an empirical study of threats reported against Machine Learning-based systems.
The study is based on 89 real-world ML attack scenarios from the MITRE's ATLAS database, the AI Incident Database, and the literature.
Results show that convolutional neural networks were one of the most targeted models among the attack scenarios.
arXiv Detail & Related papers (2022-06-30T20:19:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.