Related papers: The Coming Crisis of Multi-Agent Misalignment: AI Alignment Must Be a Dynamic and Social Process

The Coming Crisis of Multi-Agent Misalignment: AI Alignment Must Be a Dynamic and Social Process

URL: http://arxiv.org/abs/2506.01080v2
Date: Fri, 06 Jun 2025 02:55:21 GMT
Title: The Coming Crisis of Multi-Agent Misalignment: AI Alignment Must Be a Dynamic and Social Process
Authors: Florian Carichon, Aditi Khandelwal, Marylou Fauchard, Golnoosh Farnadi,
Abstract summary: AI alignment with human values and preferences remains a core challenge.<n>As agents engage with one another, they must coordinate to accomplish both individual and collective goals.<n>Social structure can deter or shatter group and individual values.<n>We call on the AI community to treat human, preferential, and objective alignment as an interdependent concept.
Score: 13.959658276224266
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This position paper states that AI Alignment in Multi-Agent Systems (MAS) should be considered a dynamic and interaction-dependent process that heavily depends on the social environment where agents are deployed, either collaborative, cooperative, or competitive. While AI alignment with human values and preferences remains a core challenge, the growing prevalence of MAS in real-world applications introduces a new dynamic that reshapes how agents pursue goals and interact to accomplish various tasks. As agents engage with one another, they must coordinate to accomplish both individual and collective goals. However, this complex social organization may unintentionally misalign some or all of these agents with human values or user preferences. Drawing on social sciences, we analyze how social structure can deter or shatter group and individual values. Based on these analyses, we call on the AI community to treat human, preferential, and objective alignment as an interdependent concept, rather than isolated problems. Finally, we emphasize the urgent need for simulation environments, benchmarks, and evaluation frameworks that allow researchers to assess alignment in these interactive multi-agent contexts before such dynamics grow too complex to control.

Related papers

Position: Emergent Machina Sapiens Urge Rethinking Multi-Agent Paradigms [8.177915265718703]
We argue that AI agents should be empowered to adjust their objectives dynamically.<n>We call for a shift toward the emergent, self-organizing, and context-aware nature of these multi-agentic AI systems.
arXiv Detail & Related papers (2025-02-05T22:20:15Z)
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration [51.452664740963066]
Collaborative Gym is a framework enabling asynchronous, tripartite interaction among agents, humans, and task environments.<n>We instantiate Co-Gym with three representative tasks in both simulated and real-world conditions.<n>Our findings reveal that collaborative agents consistently outperform their fully autonomous counterparts in task performance.
arXiv Detail & Related papers (2024-12-20T09:21:15Z)
Principal-Agent Reinforcement Learning: Orchestrating AI Agents with Contracts [20.8288955218712]
We propose a framework where a principal guides an agent in a Markov Decision Process (MDP) using a series of contracts. We present and analyze a meta-algorithm that iteratively optimize the policies of the principal and agent. We then scale our algorithm with deep Q-learning and analyze its convergence in the presence of approximation error.
arXiv Detail & Related papers (2024-07-25T14:28:58Z)
Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions [101.67121669727354]
Recent advancements in AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. The lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment. We introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML)
arXiv Detail & Related papers (2024-06-13T16:03:25Z)
Quantifying Misalignment Between Agents: Towards a Sociotechnical Understanding of Alignment [2.619545850602691]
Recent sociotechnical approaches highlight the need to understand complex misalignment among multiple human and AI agents.<n>We adapt a computational social science model of human contention to the alignment problem.<n>Our model quantifies misalignment in large, diverse agent groups with potentially conflicting goals.
arXiv Detail & Related papers (2024-06-06T16:31:22Z)
AntEval: Evaluation of Social Interaction Competencies in LLM-Driven Agents [65.16893197330589]
Large Language Models (LLMs) have demonstrated their ability to replicate human behaviors across a wide range of scenarios. However, their capability in handling complex, multi-character social interactions has yet to be fully explored. We introduce the Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods.
arXiv Detail & Related papers (2024-01-12T11:18:00Z)
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents [107.4138224020773]
We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and humans. In our environment, agents role-play and interact under a wide variety of scenarios; they coordinate, collaborate, exchange, and compete with each other to achieve complex social goals. We find that GPT-4 achieves a significantly lower goal completion rate than humans and struggles to exhibit social commonsense reasoning and strategic communication skills.
arXiv Detail & Related papers (2023-10-18T02:27:01Z)
Towards socially-competent and culturally-adaptive artificial agents Expressive order, interactional disruptions and recovery strategies [0.0]
The overarching aim of this work is to set a framework to make the artificial agent socially-competent beyond dyadic interaction-interaction. The paper highlights how this level of competence is achieved by focusing on just three dimensions: (i) social capability, (ii) relational role, and (iii) proximity.
arXiv Detail & Related papers (2023-08-06T15:47:56Z)
Training Socially Aligned Language Models on Simulated Social Interactions [99.39979111807388]
Social alignment in AI systems aims to ensure that these models behave according to established societal values. Current language models (LMs) are trained to rigidly replicate their training corpus in isolation. This work presents a novel training paradigm that permits LMs to learn from simulated social interactions.
arXiv Detail & Related papers (2023-05-26T14:17:36Z)
Rethinking Trajectory Prediction via "Team Game" [118.59480535826094]
We present a novel formulation for multi-agent trajectory prediction, which explicitly introduces the concept of interactive group consensus. On two multi-agent settings, i.e. team sports and pedestrians, the proposed framework consistently achieves superior performance compared to existing methods.
arXiv Detail & Related papers (2022-10-17T07:16:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.