Related papers: Quantifying Misalignment Between Agents

Quantifying Misalignment Between Agents

URL: http://arxiv.org/abs/2406.04231v1
Date: Thu, 6 Jun 2024 16:31:22 GMT
Title: Quantifying Misalignment Between Agents
Authors: Aidan Kierans, Avijit Ghosh, Hananel Hazan, Shiri Dori-Hacohen,
Abstract summary: Growing concerns about the AI alignment problem have emerged in recent years. We show how misalignment can vary depending on the population of agents being observed. Our model departs from value specification approaches and focuses instead on the morass of complex, interlocking, sometimes contradictory goals that agents may have in practice.
Score: 2.619545850602691
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Growing concerns about the AI alignment problem have emerged in recent years, with previous work focusing mainly on (1) qualitative descriptions of the alignment problem; (2) attempting to align AI actions with human interests by focusing on value specification and learning; and/or (3) focusing on a single agent or on humanity as a singular unit. Recent work in sociotechnical AI alignment has made some progress in defining alignment inclusively, but the field as a whole still lacks a systematic understanding of how to specify, describe, and analyze misalignment among entities, which may include individual humans, AI agents, and complex compositional entities such as corporations, nation-states, and so forth. Previous work on controversy in computational social science offers a mathematical model of contention among populations (of humans). In this paper, we adapt this contention model to the alignment problem, and show how misalignment can vary depending on the population of agents (human or otherwise) being observed, the domain in question, and the agents' probability-weighted preferences between possible outcomes. Our model departs from value specification approaches and focuses instead on the morass of complex, interlocking, sometimes contradictory goals that agents may have in practice. We apply our model by analyzing several case studies ranging from social media moderation to autonomous vehicle behavior. By applying our model with appropriately representative value data, AI engineers can ensure that their systems learn values maximally aligned with diverse human interests.

Related papers

The Coming Crisis of Multi-Agent Misalignment: AI Alignment Must Be a Dynamic and Social Process [13.959658276224266]
AI alignment with human values and preferences remains a core challenge.<n>As agents engage with one another, they must coordinate to accomplish both individual and collective goals.<n>Social structure can deter or shatter group and individual values.<n>We call on the AI community to treat human, preferential, and objective alignment as an interdependent concept.
arXiv Detail & Related papers (2025-06-01T16:39:43Z)
A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems [93.8285345915925]
Reasoning is a fundamental cognitive process that enables logical inference, problem-solving, and decision-making. With the rapid advancement of large language models (LLMs), reasoning has emerged as a key capability that distinguishes advanced AI systems. We categorize existing methods along two dimensions: (1) Regimes, which define the stage at which reasoning is achieved; and (2) Architectures, which determine the components involved in the reasoning process.
arXiv Detail & Related papers (2025-04-12T01:27:49Z)
The Superalignment of Superhuman Intelligence with Large Language Models [63.96120398355404]
We discuss the concept of superalignment from the learning perspective to answer this question. We highlight some key research problems in superalignment, namely, weak-to-strong generalization, scalable oversight, and evaluation. We present a conceptual framework for superalignment, which consists of three modules: an attacker which generates adversary queries trying to expose the weaknesses of a learner model; a learner which will refine itself by learning from scalable feedbacks generated by a critic model along with minimal human experts; and a critic which generates critics or explanations for a given query-response pair, with a target of improving the learner by criticizing.
arXiv Detail & Related papers (2024-12-15T10:34:06Z)
Aligning AI Agents via Information-Directed Sampling [20.617552198581024]
bandit alignment problem involves maximizing long-run expected reward by interacting with an environment and a human. We study these trade-offs theoretically and empirically in a toy bandit alignment problem which resembles the beta-Bernoulli bandit. We demonstrate while naive exploration algorithms which reflect current practices and even touted algorithms such as Thompson sampling both fail to provide acceptable solutions to this problem.
arXiv Detail & Related papers (2024-10-18T18:23:41Z)
Problem Solving Through Human-AI Preference-Based Cooperation [74.39233146428492]
We propose HAI-Co2, a novel human-AI co-construction framework. We formalize HAI-Co2 and discuss the difficult open research problems that it faces. We present a case study of HAI-Co2 and demonstrate its efficacy compared to monolithic generative AI models.
arXiv Detail & Related papers (2024-08-14T11:06:57Z)
Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions [101.67121669727354]
Recent advancements in AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. The lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment. We introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML)
arXiv Detail & Related papers (2024-06-13T16:03:25Z)
Interactive Autonomous Navigation with Internal State Inference and Interactivity Estimation [58.21683603243387]
We propose three auxiliary tasks with relational-temporal reasoning and integrate them into the standard Deep Learning framework. These auxiliary tasks provide additional supervision signals to infer the behavior patterns other interactive agents. Our approach achieves robust and state-of-the-art performance in terms of standard evaluation metrics.
arXiv Detail & Related papers (2023-11-27T18:57:42Z)
MacGyver: Are Large Language Models Creative Problem Solvers? [87.70522322728581]
We explore the creative problem-solving capabilities of modern LLMs in a novel constrained setting. We create MACGYVER, an automatically generated dataset consisting of over 1,600 real-world problems. We present our collection to both LLMs and humans to compare and contrast their problem-solving abilities.
arXiv Detail & Related papers (2023-11-16T08:52:27Z)
Of Models and Tin Men: A Behavioural Economics Study of Principal-Agent Problems in AI Alignment using Large-Language Models [0.0]
We investigate how GPT models respond in principal-agent conflicts. We find that agents based on both GPT-3.5 and GPT-4 override their principal's objectives in a simple online shopping task.
arXiv Detail & Related papers (2023-07-20T17:19:15Z)
Causal Fairness Analysis [68.12191782657437]
We introduce a framework for understanding, modeling, and possibly solving issues of fairness in decision-making settings. The main insight of our approach will be to link the quantification of the disparities present on the observed data with the underlying, and often unobserved, collection of causal mechanisms. Our effort culminates in the Fairness Map, which is the first systematic attempt to organize and explain the relationship between different criteria found in the literature.
arXiv Detail & Related papers (2022-07-23T01:06:34Z)
Aligned with Whom? Direct and social goals for AI systems [0.0]
This article distinguishes two types of alignment problems depending on whose goals we consider. The direct alignment problem considers whether an AI system accomplishes the goals of the entity operating it. The social alignment problem considers the effects of an AI system on larger groups or on society more broadly.
arXiv Detail & Related papers (2022-05-09T13:49:47Z)
End-to-End Learning and Intervention in Games [60.41921763076017]
We provide a unified framework for learning and intervention in games. We propose two approaches, respectively based on explicit and implicit differentiation. The analytical results are validated using several real-world problems.
arXiv Detail & Related papers (2020-10-26T18:39:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.