Quantifying Misalignment Between Agents
- URL: http://arxiv.org/abs/2406.04231v1
- Date: Thu, 6 Jun 2024 16:31:22 GMT
- Title: Quantifying Misalignment Between Agents
- Authors: Aidan Kierans, Avijit Ghosh, Hananel Hazan, Shiri Dori-Hacohen,
- Abstract summary: Growing concerns about the AI alignment problem have emerged in recent years.
We show how misalignment can vary depending on the population of agents being observed.
Our model departs from value specification approaches and focuses instead on the morass of complex, interlocking, sometimes contradictory goals that agents may have in practice.
- Score: 2.619545850602691
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Growing concerns about the AI alignment problem have emerged in recent years, with previous work focusing mainly on (1) qualitative descriptions of the alignment problem; (2) attempting to align AI actions with human interests by focusing on value specification and learning; and/or (3) focusing on a single agent or on humanity as a singular unit. Recent work in sociotechnical AI alignment has made some progress in defining alignment inclusively, but the field as a whole still lacks a systematic understanding of how to specify, describe, and analyze misalignment among entities, which may include individual humans, AI agents, and complex compositional entities such as corporations, nation-states, and so forth. Previous work on controversy in computational social science offers a mathematical model of contention among populations (of humans). In this paper, we adapt this contention model to the alignment problem, and show how misalignment can vary depending on the population of agents (human or otherwise) being observed, the domain in question, and the agents' probability-weighted preferences between possible outcomes. Our model departs from value specification approaches and focuses instead on the morass of complex, interlocking, sometimes contradictory goals that agents may have in practice. We apply our model by analyzing several case studies ranging from social media moderation to autonomous vehicle behavior. By applying our model with appropriately representative value data, AI engineers can ensure that their systems learn values maximally aligned with diverse human interests.
Related papers
- Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions [101.67121669727354]
Recent advancements in AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment.
The lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment.
arXiv Detail & Related papers (2024-06-13T16:03:25Z) - Bias Mitigation via Compensation: A Reinforcement Learning Perspective [1.5442389863546546]
Group dynamics might require that one agent (e.g., the AI system) compensate for biases and errors in another agent (e.g., the human)
We provide a theoretical framework for algorithmic compensation that synthesizes game theory and reinforcement learning principles.
This work then underpins our ethical analysis of the conditions in which AI agents should adapt to biases and behaviors of other agents.
arXiv Detail & Related papers (2024-04-30T04:41:47Z) - Position Paper: Agent AI Towards a Holistic Intelligence [53.35971598180146]
We emphasize developing Agent AI -- an embodied system that integrates large foundation models into agent actions.
In this paper, we propose a novel large action model to achieve embodied intelligent behavior, the Agent Foundation Model.
arXiv Detail & Related papers (2024-02-28T16:09:56Z) - Fairness in AI and Its Long-Term Implications on Society [68.8204255655161]
We take a closer look at AI fairness and analyze how lack of AI fairness can lead to deepening of biases over time.
We discuss how biased models can lead to more negative real-world outcomes for certain groups.
If the issues persist, they could be reinforced by interactions with other risks and have severe implications on society in the form of social unrest.
arXiv Detail & Related papers (2023-04-16T11:22:59Z) - Human-Centric Multimodal Machine Learning: Recent Advances and Testbed
on AI-based Recruitment [66.91538273487379]
There is a certain consensus about the need to develop AI applications with a Human-Centric approach.
Human-Centric Machine Learning needs to be developed based on four main requirements: (i) utility and social good; (ii) privacy and data ownership; (iii) transparency and accountability; and (iv) fairness in AI-driven decision-making processes.
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
arXiv Detail & Related papers (2023-02-13T16:44:44Z) - Goal Alignment: A Human-Aware Account of Value Alignment Problem [16.660807368368758]
Value alignment problems arise in scenarios where the specified objectives of an AI agent don't match the true underlying objective of its users.
A foundational cause for misalignment is the inherent asymmetry in human expectations about the agent's behavior and the behavior generated by the agent for the specified objective.
We propose a novel formulation for the value alignment problem, named goal alignment that focuses on a few central challenges related to value alignment.
arXiv Detail & Related papers (2023-02-02T01:18:57Z) - Aligned with Whom? Direct and social goals for AI systems [0.0]
This article distinguishes two types of alignment problems depending on whose goals we consider.
The direct alignment problem considers whether an AI system accomplishes the goals of the entity operating it.
The social alignment problem considers the effects of an AI system on larger groups or on society more broadly.
arXiv Detail & Related papers (2022-05-09T13:49:47Z) - Human-AI Collaboration via Conditional Delegation: A Case Study of
Content Moderation [47.102566259034326]
We propose conditional delegation as an alternative paradigm for human-AI collaboration.
We develop novel interfaces to assist humans in creating conditional delegation rules.
Our study demonstrates the promise of conditional delegation in improving model performance.
arXiv Detail & Related papers (2022-04-25T17:00:02Z) - A Mental-Model Centric Landscape of Human-AI Symbiosis [31.14516396625931]
We introduce a significantly general version of human-aware AI interaction scheme, called generalized human-aware interaction (GHAI)
We will see how this new framework allows us to capture the various works done in the space of human-AI interaction and identify the fundamental behavioral patterns supported by these works.
arXiv Detail & Related papers (2022-02-18T22:08:08Z) - End-to-End Learning and Intervention in Games [60.41921763076017]
We provide a unified framework for learning and intervention in games.
We propose two approaches, respectively based on explicit and implicit differentiation.
The analytical results are validated using several real-world problems.
arXiv Detail & Related papers (2020-10-26T18:39:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.