Aetheria: A multimodal interpretable content safety framework based on multi-agent debate and collaboration
- URL: http://arxiv.org/abs/2512.02530v2
- Date: Tue, 09 Dec 2025 05:44:24 GMT
- Title: Aetheria: A multimodal interpretable content safety framework based on multi-agent debate and collaboration
- Authors: Yuxiang He, Jian Zhao, Yuchen Yuan, Tianle Zhang, Wei Cai, Haojie Cheng, Ziyan Shi, Ming Zhu, Haichuan Tang, Chi Zhang, Xuelong Li,
- Abstract summary: We propose Aetheria, a multimodal interpretable content safety framework based on multi-agent debate and collaboration.<n>Aetheria generates detailed and traceable audit reports and demonstrates significant advantages over baselines in overall content safety accuracy.
- Score: 45.26893171909154
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The exponential growth of digital content presents significant challenges for content safety. Current moderation systems, often based on single models or fixed pipelines, exhibit limitations in identifying implicit risks and providing interpretable judgment processes. To address these issues, we propose Aetheria, a multimodal interpretable content safety framework based on multi-agent debate and collaboration.Employing a collaborative architecture of five core agents, Aetheria conducts in-depth analysis and adjudication of multimodal content through a dynamic, mutually persuasive debate mechanism, which is grounded by RAG-based knowledge retrieval.Comprehensive experiments on our proposed benchmark (AIR-Bench) validate that Aetheria not only generates detailed and traceable audit reports but also demonstrates significant advantages over baselines in overall content safety accuracy, especially in the identification of implicit risks. This framework establishes a transparent and interpretable paradigm, significantly advancing the field of trustworthy AI content moderation.
Related papers
- ConceptGuard: Proactive Safety in Text-and-Image-to-Video Generation through Multimodal Risk Detection [27.47621607462884]
ConceptGuard is a framework for proactively detecting and mitigating unsafe semantics in multimodal video generation.<n>A contrastive detection module identifies latent safety risks by projecting fused image-text inputs into a structured concept space.<n>A semantic suppression mechanism steers the generative process away from unsafe concepts by intervening in the prompt's multimodal conditioning.
arXiv Detail & Related papers (2025-11-24T05:27:05Z) - SentiMM: A Multimodal Multi-Agent Framework for Sentiment Analysis in Social Media [6.2300278659598485]
We propose SentiMM, a novel multi-agent framework for sentiment analysis.<n>SentiMM processes text and visual inputs through specialized agents, fuses multimodal features, enriches context via knowledge retrieval, and aggregates results for final sentiment classification.<n>We also introduce SentiMMD, a large-scale multimodal dataset with seven fine-grained sentiment categories.
arXiv Detail & Related papers (2025-08-25T15:17:53Z) - MV-Debate: Multi-view Agent Debate with Dynamic Reflection Gating for Multimodal Harmful Content Detection in Social Media [26.07883439550861]
MV-Debate is a multi-view agent debate framework with dynamic reflection gating for unified multimodal harmful content detection.<n>MV-Debate assembles four complementary debate agents, a surface analyst, a deep reasoner, a modality contrast, and a social contextualist, to analyze content from diverse interpretive perspectives.
arXiv Detail & Related papers (2025-08-07T16:38:25Z) - CoRe-MMRAG: Cross-Source Knowledge Reconciliation for Multimodal RAG [53.950029990391066]
Cross-source knowledge textbfReconciliation for Multimodal RAG (CoRe-MMRAG)<n>We propose a novel end-to-end framework that effectively reconciles inconsistencies across knowledge sources.<n>Experiments on KB-VQA benchmarks show that CoRe-MMRAG achieves substantial improvements over baseline methods.
arXiv Detail & Related papers (2025-06-03T07:32:40Z) - Advancing Neural Network Verification through Hierarchical Safety Abstract Interpretation [52.626086874715284]
We introduce a novel problem formulation called Abstract DNN-Verification, which verifies a hierarchical structure of unsafe outputs.<n>By leveraging abstract interpretation and reasoning about output reachable sets, our approach enables assessing multiple safety levels during the formal verification process.<n>Our contributions include a theoretical exploration of the relationship between our novel abstract safety formulation and existing approaches.
arXiv Detail & Related papers (2025-05-08T13:29:46Z) - Knowledge-Aware Iterative Retrieval for Multi-Agent Systems [0.0]
We introduce a novel large language model (LLM)-driven agent framework.<n>It iteratively refines queries and filters contextual evidence by leveraging dynamically evolving knowledge.<n>The proposed system supports both competitive and collaborative sharing of updated context.
arXiv Detail & Related papers (2025-03-17T15:27:02Z) - AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.374792825813394]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability.<n>The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z) - Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A Survey [92.36487127683053]
Retrieval-Augmented Generation (RAG) is an advanced technique designed to address the challenges of Artificial Intelligence-Generated Content (AIGC)<n>RAG provides reliable and up-to-date external knowledge, reduces hallucinations, and ensures relevant context across a wide range of tasks.<n>Despite RAG's success and potential, recent studies have shown that the RAG paradigm also introduces new risks, including privacy concerns, adversarial attacks, and accountability issues.
arXiv Detail & Related papers (2025-02-08T06:50:47Z) - CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model [9.224965304457708]
This paper introduces Contextual Understanding and Enhanced Search with MLLM (CUE-M), a novel multimodal search framework.<n>It incorporates image context enrichment, intent refinement, contextual query generation, external API integration, and relevance-based filtering.<n>Experiments on real-word datasets and public benchmarks on knowledge-based VQA and safety demonstrated that CUE-M outperforms baselines and establishes new state-of-the-art results.
arXiv Detail & Related papers (2024-11-19T07:16:48Z) - SoK: The Security-Safety Continuum of Multimodal Foundation Models through Information Flow and Game-Theoretic Defenses [58.93030774141753]
Multimodal foundation models (MFMs) integrate diverse data modalities to support complex and wide-ranging tasks.<n>In this paper, we unify the concepts of safety and security in the context of MFMs by identifying critical threats that arise from both model behavior and system-level interactions.
arXiv Detail & Related papers (2024-11-17T23:06:20Z) - Towards Human-Level Understanding of Complex Process Engineering Schematics: A Pedagogical, Introspective Multi-Agent Framework for Open-Domain Question Answering [0.0]
In the chemical and process industries, Process Flow Diagrams (PFDs) and Piping and Instrumentation Diagrams (P&IDs) are critical for design, construction, and maintenance.
Recent advancements in Generative AI have shown promise in understanding and interpreting process diagrams for Visual Question Answering (VQA)
We propose a secure, on-premises enterprise solution using a hierarchical, multi-agent Retrieval Augmented Generation (RAG) framework.
arXiv Detail & Related papers (2024-08-24T19:34:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.