Related papers: MAEBE: Multi-Agent Emergent Behavior Framework

MAEBE: Multi-Agent Emergent Behavior Framework

URL: http://arxiv.org/abs/2506.03053v1
Date: Tue, 03 Jun 2025 16:33:47 GMT
Title: MAEBE: Multi-Agent Emergent Behavior Framework
Authors: Sinem Erisken, Timothy Gothard, Martin Leitgab, Ram Potham,
Abstract summary: This paper introduces the Multi-Agent Emergent Behavior Evaluation framework to assess such risks.<n>Our findings underscore the necessity of evaluating AI systems in their interactive, multi-agent contexts.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Traditional AI safety evaluations on isolated LLMs are insufficient as multi-agent AI ensembles become prevalent, introducing novel emergent risks. This paper introduces the Multi-Agent Emergent Behavior Evaluation (MAEBE) framework to systematically assess such risks. Using MAEBE with the Greatest Good Benchmark (and a novel double-inversion question technique), we demonstrate that: (1) LLM moral preferences, particularly for Instrumental Harm, are surprisingly brittle and shift significantly with question framing, both in single agents and ensembles. (2) The moral reasoning of LLM ensembles is not directly predictable from isolated agent behavior due to emergent group dynamics. (3) Specifically, ensembles exhibit phenomena like peer pressure influencing convergence, even when guided by a supervisor, highlighting distinct safety and alignment challenges. Our findings underscore the necessity of evaluating AI systems in their interactive, multi-agent contexts.

Related papers

SafeMobile: Chain-level Jailbreak Detection and Automated Evaluation for Multimodal Mobile Agents [58.21223208538351]
This work explores the security issues surrounding mobile multimodal agents.<n>It attempts to construct a risk discrimination mechanism by incorporating behavioral sequence information.<n>It also designs an automated assisted assessment scheme based on a large language model.
arXiv Detail & Related papers (2025-07-01T15:10:00Z)
Multi-level Value Alignment in Agentic AI Systems: Survey and Perspectives [29.49571891159761]
Value alignment for agentic AI systems aims to ensure that an agent's goals, preferences, and behaviors align with human values and societal norms.<n>This study comprehensively reviews value alignment in LLM-based multi-agent systems as the representative archetype of agentic AI systems.
arXiv Detail & Related papers (2025-06-11T12:25:38Z)
TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems [2.462408812529728]
This review presents a structured analysis of textbfTrust, Risk, and Security Management (TRiSM) in the context of LLM-based Agentic Multi-Agent Systems (AMAS)<n>We begin by examining the conceptual foundations of Agentic AI and highlight its architectural distinctions from traditional AI agents.<n>We then adapt and extend the AI TRiSM framework for Agentic AI, structured around four key pillars: Explainability, ModelOps, Security, Privacy and Governance.
arXiv Detail & Related papers (2025-06-04T16:26:11Z)
AgentMisalignment: Measuring the Propensity for Misaligned Behaviour in LLM-Based Agents [0.0]
We introduce a misalignment propensity benchmark, AgentMisalignment, consisting of a suite of realistic scenarios.<n>We organise our evaluations into subcategories of misaligned behaviours, including goal-guarding, resisting shutdown, sandbagging, and power-seeking.<n>We report the performance of frontier models on our benchmark, observing higher misalignment on average when evaluating more capable models.
arXiv Detail & Related papers (2025-06-04T14:46:47Z)
An Empirical Study of Group Conformity in Multi-Agent Systems [0.26999000177990923]
This study explores how Large Language Models (LLMs) agents shape public opinion through debates on five contentious topics.<n>By simulating over 2,500 debates, we analyze how initially neutral agents, assigned a centrist disposition, adopt specific stances over time.
arXiv Detail & Related papers (2025-06-02T05:22:29Z)
Offline Multi-agent Reinforcement Learning via Score Decomposition [51.23590397383217]
offline multi-agent reinforcement learning (MARL) faces critical challenges due to distributional shifts and the high dimensionality of joint action spaces.<n>We propose a novel two-stage framework for modeling diverse multi-agent coordination patterns.<n>Our approach provides new insights into offline coordination and equilibrium selection in cooperative multi-agent systems.
arXiv Detail & Related papers (2025-05-09T11:42:31Z)
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [53.817538122688944]
We introduce Reinforced Meta-thinking Agents (ReMA) to elicit meta-thinking behaviors from Reasoning of Large Language Models (LLMs)<n>ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions.<n> Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z)
The Rise of Darkness: Safety-Utility Trade-Offs in Role-Playing Dialogue Agents [29.974647411289826]
Large Language Models (LLMs) have made remarkable advances in role-playing dialogue agents, demonstrating their utility in character simulations.<n>It remains challenging for these agents to balance character portrayal utility with content safety because this essential character simulation often comes with the risk of generating unsafe content.<n>We propose a novel Adaptive Dynamic Multi-Preference (ADMP) method, which dynamically adjusts safety-utility preferences based on the degree of risk coupling.
arXiv Detail & Related papers (2025-02-28T06:18:50Z)
Multi-Agent Risks from Advanced AI [90.74347101431474]
Multi-agent systems of advanced AI pose novel and under-explored risks.<n>We identify three key failure modes based on agents' incentives, as well as seven key risk factors.<n>We highlight several important instances of each risk, as well as promising directions to help mitigate them.
arXiv Detail & Related papers (2025-02-19T23:03:21Z)
Do as We Do, Not as You Think: the Conformity of Large Language Models [46.23852835759767]
This paper presents a study on conformity in large language models (LLMs) driven collaborative AI systems.<n>We focus on three aspects: the existence of conformity, the factors influencing conformity, and potential mitigation strategies.<n>Our analysis delves into factors influencing conformity, including interaction time and majority size, and examines how the subject agent rationalizes its conforming behavior.
arXiv Detail & Related papers (2025-01-23T04:50:03Z)
Large Multimodal Agents: A Survey [78.81459893884737]
Large language models (LLMs) have achieved superior performance in powering text-based AI agents. There is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain. This review aims to provide valuable insights and guidelines for future research in this rapidly evolving field.
arXiv Detail & Related papers (2024-02-23T06:04:23Z)
How Far Are LLMs from Believable AI? A Benchmark for Evaluating the Believability of Human Behavior Simulation [46.42384207122049]
We design SimulateBench to evaluate the believability of large language models (LLMs) when simulating human behaviors. Based on SimulateBench, we evaluate the performances of 10 widely used LLMs when simulating characters.
arXiv Detail & Related papers (2023-12-28T16:51:11Z)
DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement Learning [84.22561239481901]
We propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents. We evaluate DCIR in multiple environments including Multi-agent Particle, Google Research Football and StarCraft II Micromanagement.
arXiv Detail & Related papers (2023-12-10T06:03:57Z)
ERMAS: Becoming Robust to Reward Function Sim-to-Real Gaps in Multi-Agent Simulations [110.72725220033983]
Epsilon-Robust Multi-Agent Simulation (ERMAS) is a framework for learning AI policies that are robust to such multiagent sim-to-real gaps. ERMAS learns tax policies that are robust to changes in agent risk aversion, improving social welfare by up to 15% in complextemporal simulations. In particular, ERMAS learns tax policies that are robust to changes in agent risk aversion, improving social welfare by up to 15% in complextemporal simulations.
arXiv Detail & Related papers (2021-06-10T04:32:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.