MAEBE: Multi-Agent Emergent Behavior Framework
        - URL: http://arxiv.org/abs/2506.03053v1
 - Date: Tue, 03 Jun 2025 16:33:47 GMT
 - Title: MAEBE: Multi-Agent Emergent Behavior Framework
 - Authors: Sinem Erisken, Timothy Gothard, Martin Leitgab, Ram Potham, 
 - Abstract summary: This paper introduces the Multi-Agent Emergent Behavior Evaluation framework to assess such risks.<n>Our findings underscore the necessity of evaluating AI systems in their interactive, multi-agent contexts.
 - Score: 0.0
 - License: http://creativecommons.org/licenses/by/4.0/
 - Abstract:   Traditional AI safety evaluations on isolated LLMs are insufficient as multi-agent AI ensembles become prevalent, introducing novel emergent risks. This paper introduces the Multi-Agent Emergent Behavior Evaluation (MAEBE) framework to systematically assess such risks. Using MAEBE with the Greatest Good Benchmark (and a novel double-inversion question technique), we demonstrate that: (1) LLM moral preferences, particularly for Instrumental Harm, are surprisingly brittle and shift significantly with question framing, both in single agents and ensembles. (2) The moral reasoning of LLM ensembles is not directly predictable from isolated agent behavior due to emergent group dynamics. (3) Specifically, ensembles exhibit phenomena like peer pressure influencing convergence, even when guided by a supervisor, highlighting distinct safety and alignment challenges. Our findings underscore the necessity of evaluating AI systems in their interactive, multi-agent contexts. 
 
       
      
        Related papers
        - SafeMobile: Chain-level Jailbreak Detection and Automated Evaluation for   Multimodal Mobile Agents [58.21223208538351]
This work explores the security issues surrounding mobile multimodal agents.<n>It attempts to construct a risk discrimination mechanism by incorporating behavioral sequence information.<n>It also designs an automated assisted assessment scheme based on a large language model.
arXiv  Detail & Related papers  (2025-07-01T15:10:00Z) - Multi-level Value Alignment in Agentic AI Systems: Survey and   Perspectives [29.49571891159761]
Value alignment for agentic AI systems aims to ensure that an agent's goals, preferences, and behaviors align with human values and societal norms.<n>This study comprehensively reviews value alignment in LLM-based multi-agent systems as the representative archetype of agentic AI systems.
arXiv  Detail & Related papers  (2025-06-11T12:25:38Z) - TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management   in LLM-based Agentic Multi-Agent Systems [2.462408812529728]
This review presents a structured analysis of textbfTrust, Risk, and Security Management (TRiSM) in the context of LLM-based Agentic Multi-Agent Systems (AMAS)<n>We begin by examining the conceptual foundations of Agentic AI and highlight its architectural distinctions from traditional AI agents.<n>We then adapt and extend the AI TRiSM framework for Agentic AI, structured around four key pillars: Explainability, ModelOps, Security, Privacy and Governance.
arXiv  Detail & Related papers  (2025-06-04T16:26:11Z) - AgentMisalignment: Measuring the Propensity for Misaligned Behaviour in   LLM-Based Agents [0.0]
We introduce a misalignment propensity benchmark, AgentMisalignment, consisting of a suite of realistic scenarios.<n>We organise our evaluations into subcategories of misaligned behaviours, including goal-guarding, resisting shutdown, sandbagging, and power-seeking.<n>We report the performance of frontier models on our benchmark, observing higher misalignment on average when evaluating more capable models.
arXiv  Detail & Related papers  (2025-06-04T14:46:47Z) - An Empirical Study of Group Conformity in Multi-Agent Systems [0.26999000177990923]
This study explores how Large Language Models (LLMs) agents shape public opinion through debates on five contentious topics.<n>By simulating over 2,500 debates, we analyze how initially neutral agents, assigned a centrist disposition, adopt specific stances over time.
arXiv  Detail & Related papers  (2025-06-02T05:22:29Z) - Offline Multi-agent Reinforcement Learning via Score Decomposition [51.23590397383217]
offline multi-agent reinforcement learning (MARL) faces critical challenges due to distributional shifts and the high dimensionality of joint action spaces.<n>We propose a novel two-stage framework for modeling diverse multi-agent coordination patterns.<n>Our approach provides new insights into offline coordination and equilibrium selection in cooperative multi-agent systems.
arXiv  Detail & Related papers  (2025-05-09T11:42:31Z) - ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement   Learning [53.817538122688944]
We introduce Reinforced Meta-thinking Agents (ReMA) to elicit meta-thinking behaviors from Reasoning of Large Language Models (LLMs)<n>ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions.<n> Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv  Detail & Related papers  (2025-03-12T16:05:31Z) - The Rise of Darkness: Safety-Utility Trade-Offs in Role-Playing Dialogue   Agents [29.974647411289826]
Large Language Models (LLMs) have made remarkable advances in role-playing dialogue agents, demonstrating their utility in character simulations.<n>It remains challenging for these agents to balance character portrayal utility with content safety because this essential character simulation often comes with the risk of generating unsafe content.<n>We propose a novel Adaptive Dynamic Multi-Preference (ADMP) method, which dynamically adjusts safety-utility preferences based on the degree of risk coupling.
arXiv  Detail & Related papers  (2025-02-28T06:18:50Z) - Multi-Agent Risks from Advanced AI [90.74347101431474]
Multi-agent systems of advanced AI pose novel and under-explored risks.<n>We identify three key failure modes based on agents' incentives, as well as seven key risk factors.<n>We highlight several important instances of each risk, as well as promising directions to help mitigate them.
arXiv  Detail & Related papers  (2025-02-19T23:03:21Z) - Do as We Do, Not as You Think: the Conformity of Large Language Models [46.23852835759767]
This paper presents a study on conformity in large language models (LLMs) driven collaborative AI systems.<n>We focus on three aspects: the existence of conformity, the factors influencing conformity, and potential mitigation strategies.<n>Our analysis delves into factors influencing conformity, including interaction time and majority size, and examines how the subject agent rationalizes its conforming behavior.
arXiv  Detail & Related papers  (2025-01-23T04:50:03Z) - Large Multimodal Agents: A Survey [78.81459893884737]
Large language models (LLMs) have achieved superior performance in powering text-based AI agents.
There is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain.
This review aims to provide valuable insights and guidelines for future research in this rapidly evolving field.
arXiv  Detail & Related papers  (2024-02-23T06:04:23Z) - How Far Are LLMs from Believable AI? A Benchmark for Evaluating the   Believability of Human Behavior Simulation [46.42384207122049]
We design SimulateBench to evaluate the believability of large language models (LLMs) when simulating human behaviors.
Based on SimulateBench, we evaluate the performances of 10 widely used LLMs when simulating characters.
arXiv  Detail & Related papers  (2023-12-28T16:51:11Z) - DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement
  Learning [84.22561239481901]
We propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents.
We evaluate DCIR in multiple environments including Multi-agent Particle, Google Research Football and StarCraft II Micromanagement.
arXiv  Detail & Related papers  (2023-12-10T06:03:57Z) - ERMAS: Becoming Robust to Reward Function Sim-to-Real Gaps in
  Multi-Agent Simulations [110.72725220033983]
Epsilon-Robust Multi-Agent Simulation (ERMAS) is a framework for learning AI policies that are robust to such multiagent sim-to-real gaps.
ERMAS learns tax policies that are robust to changes in agent risk aversion, improving social welfare by up to 15% in complextemporal simulations.
In particular, ERMAS learns tax policies that are robust to changes in agent risk aversion, improving social welfare by up to 15% in complextemporal simulations.
arXiv  Detail & Related papers  (2021-06-10T04:32:20Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.