Quantifying Misalignment Between Agents: Towards a Sociotechnical Understanding of Alignment
- URL: http://arxiv.org/abs/2406.04231v2
- Date: Sat, 7 Sep 2024 19:05:15 GMT
- Title: Quantifying Misalignment Between Agents: Towards a Sociotechnical Understanding of Alignment
- Authors: Aidan Kierans, Avijit Ghosh, Hananel Hazan, Shiri Dori-Hacohen,
- Abstract summary: Recent sociotechnical approaches highlight the need to understand complex misalignment among multiple human and AI agents.
We adapt a computational social science model of human contention to the alignment problem.
Our model quantifies misalignment in large, diverse agent groups with potentially conflicting goals.
- Score: 2.619545850602691
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing work on the alignment problem has focused mainly on (1) qualitative descriptions of the alignment problem; (2) attempting to align AI actions with human interests by focusing on value specification and learning; and/or (3) focusing on a single agent or on humanity as a monolith. Recent sociotechnical approaches highlight the need to understand complex misalignment among multiple human and AI agents. We address this gap by adapting a computational social science model of human contention to the alignment problem. Our model quantifies misalignment in large, diverse agent groups with potentially conflicting goals across various problem areas. Misalignment scores in our framework depend on the observed agent population, the domain in question, and conflict between agents' weighted preferences. Through simulations, we demonstrate how our model captures intuitive aspects of misalignment across different scenarios. We then apply our model to two case studies, including an autonomous vehicle setting, showcasing its practical utility. Our approach offers enhanced explanatory power for complex sociotechnical environments and could inform the design of more aligned AI systems in real-world applications.
Related papers
- Aligning AI Agents via Information-Directed Sampling [20.617552198581024]
bandit alignment problem involves maximizing long-run expected reward by interacting with an environment and a human.
We study these trade-offs theoretically and empirically in a toy bandit alignment problem which resembles the beta-Bernoulli bandit.
We demonstrate while naive exploration algorithms which reflect current practices and even touted algorithms such as Thompson sampling both fail to provide acceptable solutions to this problem.
arXiv Detail & Related papers (2024-10-18T18:23:41Z) - Problem Solving Through Human-AI Preference-Based Cooperation [74.39233146428492]
We propose HAI-Co2, a novel human-AI co-construction framework.
We formalize HAI-Co2 and discuss the difficult open research problems that it faces.
We present a case study of HAI-Co2 and demonstrate its efficacy compared to monolithic generative AI models.
arXiv Detail & Related papers (2024-08-14T11:06:57Z) - Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions [101.67121669727354]
Recent advancements in AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment.
The lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment.
We introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML)
arXiv Detail & Related papers (2024-06-13T16:03:25Z) - Interactive Autonomous Navigation with Internal State Inference and
Interactivity Estimation [58.21683603243387]
We propose three auxiliary tasks with relational-temporal reasoning and integrate them into the standard Deep Learning framework.
These auxiliary tasks provide additional supervision signals to infer the behavior patterns other interactive agents.
Our approach achieves robust and state-of-the-art performance in terms of standard evaluation metrics.
arXiv Detail & Related papers (2023-11-27T18:57:42Z) - MacGyver: Are Large Language Models Creative Problem Solvers? [87.70522322728581]
We explore the creative problem-solving capabilities of modern LLMs in a novel constrained setting.
We create MACGYVER, an automatically generated dataset consisting of over 1,600 real-world problems.
We present our collection to both LLMs and humans to compare and contrast their problem-solving abilities.
arXiv Detail & Related papers (2023-11-16T08:52:27Z) - Of Models and Tin Men: A Behavioural Economics Study of Principal-Agent
Problems in AI Alignment using Large-Language Models [0.0]
We investigate how GPT models respond in principal-agent conflicts.
We find that agents based on both GPT-3.5 and GPT-4 override their principal's objectives in a simple online shopping task.
arXiv Detail & Related papers (2023-07-20T17:19:15Z) - Causal Fairness Analysis [68.12191782657437]
We introduce a framework for understanding, modeling, and possibly solving issues of fairness in decision-making settings.
The main insight of our approach will be to link the quantification of the disparities present on the observed data with the underlying, and often unobserved, collection of causal mechanisms.
Our effort culminates in the Fairness Map, which is the first systematic attempt to organize and explain the relationship between different criteria found in the literature.
arXiv Detail & Related papers (2022-07-23T01:06:34Z) - Aligned with Whom? Direct and social goals for AI systems [0.0]
This article distinguishes two types of alignment problems depending on whose goals we consider.
The direct alignment problem considers whether an AI system accomplishes the goals of the entity operating it.
The social alignment problem considers the effects of an AI system on larger groups or on society more broadly.
arXiv Detail & Related papers (2022-05-09T13:49:47Z) - End-to-End Learning and Intervention in Games [60.41921763076017]
We provide a unified framework for learning and intervention in games.
We propose two approaches, respectively based on explicit and implicit differentiation.
The analytical results are validated using several real-world problems.
arXiv Detail & Related papers (2020-10-26T18:39:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.