Related papers: Scalable Safe Multi-Agent Reinforcement Learning for Multi-Agent System

Related papers

Reimagining Safety Alignment with An Image [49.33281424100804]
Large language models (LLMs) excel in diverse applications but face dual challenges: generating harmful content under jailbreak attacks and over-refusal of benign queries.<n>We propose Magic Image, an optimization-driven visual prompt framework that enhances security while reducing over-refusal.
arXiv Detail & Related papers (2025-11-01T11:27:07Z)
MAD-PINN: A Decentralized Physics-Informed Machine Learning Framework for Safe and Optimal Multi-Agent Control [13.531665564516155]
Co-optimizing safety and performance in large-scale multi-agent systems remains a fundamental challenge.<n>We propose MAD-PINN, a decentralized machine learning framework for solving the multi-agent state-constrained optimal control problem.<n> Experiments on multi-agent navigation tasks demonstrate that MAD-PINN achieves superior safety-performance trade-offs, maintains scalability as the number of agents grows, and consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-09-28T16:31:22Z)
When Safe Unimodal Inputs Collide: Optimizing Reasoning Chains for Cross-Modal Safety in Multimodal Large Language Models [50.66979825532277]
We introduce Safe-Semantics-but-Unsafe-Interpretation (SSUI), the first dataset featuring interpretable reasoning paths tailored for a cross-modal challenge.<n>A novel training framework, Safety-aware Reasoning Path Optimization (SRPO), is also designed based on the SSUI dataset.<n> Experimental results show that our SRPO-trained models achieve state-of-the-art results on key safety benchmarks.
arXiv Detail & Related papers (2025-09-15T15:40:58Z)
Automating Steering for Safe Multimodal Large Language Models [58.36932318051907]
We introduce a modular and adaptive inference-time intervention technology, AutoSteer, without requiring any fine-tuning of the underlying model.<n>AutoSteer incorporates three core components: (1) a novel Safety Awareness Score (SAS) that automatically identifies the most safety-relevant distinctions among the model's internal layers; (2) an adaptive safety prober trained to estimate the likelihood of toxic outputs from intermediate representations; and (3) a lightweight Refusal Head that selectively intervenes to modulate generation when safety risks are detected.
arXiv Detail & Related papers (2025-07-17T16:04:55Z)
Enhancing Robustness of LLM-Driven Multi-Agent Systems through Randomized Smoothing [13.997409139696556]
This paper presents a framework for enhancing the safety of large language model (LLM) empowered multi-agent systems (MAS) in safety-critical domains such as aerospace.<n>We apply randomized smoothing, a statistical robustness certification technique, to the MAS consensus context, enabling probabilistic guarantees on agent decisions under adversarial influence.
arXiv Detail & Related papers (2025-07-05T17:26:08Z)
Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models [83.80177564873094]
We propose a unified multimodal universal jailbreak attack framework.<n>We evaluate the undesirable context generation of MLLMs like LLaVA, Yi-VL, MiniGPT4, MiniGPT-v2, and InstructBLIP.<n>This study underscores the urgent need for robust safety measures in MLLMs.
arXiv Detail & Related papers (2025-06-02T04:33:56Z)
A Weighted Byzantine Fault Tolerance Consensus Driven Trusted Multiple Large Language Models Network [53.37983409425452]
Large Language Models (LLMs) have achieved remarkable success across a wide range of applications.<n>Recently, collaborative frameworks such as the Multi-LLM Network (MultiLLMN) have been introduced.<n>We propose a novel Trusted MultiLLMN framework driven by a weighted Byzantine Fault Tolerance (WBFT) blockchain consensus mechanism.
arXiv Detail & Related papers (2025-05-08T10:04:41Z)
A Trustworthy Multi-LLM Network: Challenges,Solutions, and A Use Case [59.58213261128626]
We propose a blockchain-enabled collaborative framework that connects multiple Large Language Models (LLMs) into a Trustworthy Multi-LLM Network (MultiLLMN)<n>This architecture enables the cooperative evaluation and selection of the most reliable and high-quality responses to complex network optimization problems.
arXiv Detail & Related papers (2025-05-06T05:32:46Z)
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models [37.104276926258095]
Multimodal Large Language Models (MLLMs) pose unique safety challenges due to their integration of visual and textual data. We introduce textbfDREAM (textittextbfDisentangling textbfRisks to textbfEnhance Safety textbfAlignment in textbfMLLMs), a novel approach that enhances safety alignment in MLLMs through supervised fine-tuning and iterative Reinforcement Learning from AI Feedback.
arXiv Detail & Related papers (2025-04-25T03:54:24Z)
Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models? [83.53005932513155]
Multi-modal large language models (MLLMs) have made significant progress, yet their safety alignment remains limited. We propose finetuning MLLMs on a small set of benign instruct-following data with responses replaced by simple, clear rejection sentences.
arXiv Detail & Related papers (2025-04-14T09:03:51Z)
Safe Multi-Agent Reinforcement Learning with Convergence to Generalized Nash Equilibrium [6.169364905804677]
Multi-agent reinforcement learning (MARL) has achieved notable success in cooperative tasks. deploying MARL agents in real-world applications presents critical safety challenges. We propose a novel theoretical framework for safe MARL with $textitstate-wise$ constraints, where safety requirements are enforced at every state the agents visit. For practical deployment in complex high-dimensional systems, we propose $textitMulti-Agent Dual Actor-Critic$ (MADAC)
arXiv Detail & Related papers (2024-11-22T16:08:42Z)
SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models [75.67623347512368]
We propose toolns, a comprehensive framework designed for conducting safety evaluations of MLLMs. Our framework consists of a comprehensive harmful query dataset and an automated evaluation protocol. Based on our framework, we conducted large-scale experiments on 15 widely-used open-source MLLMs and 6 commercial MLLMs.
arXiv Detail & Related papers (2024-10-24T17:14:40Z)
Diffusion Models for Offline Multi-agent Reinforcement Learning with Safety Constraints [0.0]
We introduce an innovative framework integrating diffusion models within the Multi-agent Reinforcement Learning paradigm. This approach notably enhances the safety of actions taken by multiple agents through risk mitigation while modeling coordinated action.
arXiv Detail & Related papers (2024-06-30T16:05:31Z)
RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content [62.685566387625975]
Current mitigation strategies, while effective, are not resilient under adversarial attacks. This paper introduces Resilient Guardrails for Large Language Models (RigorLLM), a novel framework designed to efficiently moderate harmful and unsafe inputs.
arXiv Detail & Related papers (2024-03-19T07:25:02Z)
Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning [48.667697255912614]
Mean-field reinforcement learning addresses the policy of a representative agent interacting with the infinite population of identical agents. We propose Safe-M$3$-UCRL, the first model-based mean-field reinforcement learning algorithm that attains safe policies even in the case of unknown transitions. Our algorithm effectively meets the demand in critical areas while ensuring service accessibility in regions with low demand.
arXiv Detail & Related papers (2023-06-29T15:57:07Z)
Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning [7.103977648997475]
Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases. Model-based Dynamic Shielding (MBDS) to support MARL algorithm design.
arXiv Detail & Related papers (2023-04-13T06:08:10Z)
Multi-Agent Constrained Policy Optimisation [17.772811770726296]
We formulate the safe MARL problem as a constrained Markov game and solve it with policy optimisation methods. Our solutions -- Multi-Agent Constrained Policy optimisation (MACPO) and MAPPO-Lagrangian -- leverage the theories from both constrained policy optimisation and multi-agent trust region learning. We develop the benchmark suite of Safe Multi-Agent MuJoCo that involves a variety of MARL baselines.
arXiv Detail & Related papers (2021-10-06T14:17:09Z)
Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents. We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z)
Practical Machine Learning Safety: A Survey and Primer [81.73857913779534]
Open-world deployment of Machine Learning algorithms in safety-critical applications such as autonomous vehicles needs to address a variety of ML vulnerabilities. New models and training techniques to reduce generalization error, achieve domain adaptation, and detect outlier examples and adversarial attacks. Our organization maps state-of-the-art ML techniques to safety strategies in order to enhance the dependability of the ML algorithm from different aspects.
arXiv Detail & Related papers (2021-06-09T05:56:42Z)
Softmax with Regularization: Better Value Estimation in Multi-Agent Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning. We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline. We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.