When Empowerment Disempowers
- URL: http://arxiv.org/abs/2511.04177v1
- Date: Thu, 06 Nov 2025 08:29:29 GMT
- Title: When Empowerment Disempowers
- Authors: Claire Yang, Maya Cakmak, Max Kleiman-Weiner,
- Abstract summary: empowerment has been proposed as a universal goal-agnostic objective for motivating assistive behavior in AI agents.<n>We introduce an open source multi-human gridworld test suite Disempower-Grid.<n>We empirically show that assistive RL agents optimizing for one human's empowerment can significantly reduce another human's environmental influence and rewards.<n>Our work reveals a broader challenge for the AI alignment community: goal-agnostic objectives that seem aligned in single-agent settings can become misaligned in multi-agent contexts.
- Score: 6.072835354847189
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Empowerment, a measure of an agent's ability to control its environment, has been proposed as a universal goal-agnostic objective for motivating assistive behavior in AI agents. While multi-human settings like homes and hospitals are promising for AI assistance, prior work on empowerment-based assistance assumes that the agent assists one human in isolation. We introduce an open source multi-human gridworld test suite Disempower-Grid. Using Disempower-Grid, we empirically show that assistive RL agents optimizing for one human's empowerment can significantly reduce another human's environmental influence and rewards - a phenomenon we formalize as disempowerment. We characterize when disempowerment occurs in these environments and show that joint empowerment mitigates disempowerment at the cost of the user's reward. Our work reveals a broader challenge for the AI alignment community: goal-agnostic objectives that seem aligned in single-agent settings can become misaligned in multi-agent contexts.
Related papers
- Training LLM Agents to Empower Humans [67.80021254324294]
We propose a new approach to tuning assistive language models based on maximizing the human's empowerment.<n>Our empowerment-maximizing method, Empower, only requires offline text data.<n>We show that agents trained with Empower increase the success rate of a simulated human programmer on challenging coding questions by an average of 192%.
arXiv Detail & Related papers (2025-10-15T16:09:33Z) - Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power [0.0]
This paper explores the idea of promoting both safety and wellbeing by forcing AI agents explicitly to empower humans.<n>We design a parametrizable and decomposable objective function that represents an inequality- and risk-averse long-term aggregate of human power.
arXiv Detail & Related papers (2025-07-31T20:56:43Z) - Implicitly Aligning Humans and Autonomous Agents through Shared Task Abstractions [42.813774494968214]
We introduce HA$2$: Hierarchical Ad Hoc Agents, a framework leveraging hierarchical reinforcement learning to mimic the structured approach humans use in collaboration.<n>We evaluate HA$2$ in the Overcooked environment, demonstrating statistically significant improvement over existing baselines when paired with both unseen agents and humans.
arXiv Detail & Related papers (2025-05-07T17:19:17Z) - Modeling AI-Human Collaboration as a Multi-Agent Adaptation [0.0]
We develop an agent-based simulation to formalize AI-human collaboration as a function of a task.<n>We show that in modular tasks, AI often substitutes for humans - delivering higher payoffs unless human expertise is very high.<n>We also show that even "hallucinatory" AI - lacking memory or structure - can improve outcomes when augmenting low-capability humans by helping escape local optima.
arXiv Detail & Related papers (2025-04-29T16:19:53Z) - Learning to Assist Humans without Inferring Rewards [65.28156318196397]
We build upon prior work that studies assistance through the lens of empowerment.<n>An assistive agent aims to maximize the influence of the human's actions.<n>We prove that these representations estimate a similar notion of empowerment to that studied by prior work.
arXiv Detail & Related papers (2024-11-04T21:31:04Z) - Smart Help: Strategic Opponent Modeling for Proactive and Adaptive Robot Assistance in Households [30.33911147366425]
Smart Help aims to provide proactive yet adaptive support to human agents with diverse disabilities.
We introduce an innovative opponent modeling module that provides a nuanced understanding of the main agent's capabilities and goals.
Our findings illustrate the potential of AI-imbued assistive robots in improving the well-being of vulnerable groups.
arXiv Detail & Related papers (2024-04-13T13:03:59Z) - The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI)
We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents.
We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z) - AgentVerse: Facilitating Multi-Agent Collaboration and Exploring
Emergent Behaviors [93.38830440346783]
We propose a multi-agent framework framework that can collaboratively adjust its composition as a greater-than-the-sum-of-its-parts system.
Our experiments demonstrate that framework framework can effectively deploy multi-agent groups that outperform a single agent.
In view of these behaviors, we discuss some possible strategies to leverage positive ones and mitigate negative ones for improving the collaborative potential of multi-agent groups.
arXiv Detail & Related papers (2023-08-21T16:47:11Z) - Compensating for Sensing Failures via Delegation in Human-AI Hybrid
Systems [0.0]
We consider the hybrid human-AI teaming case where a managing agent is tasked with identifying when to perform a delegation assignment.
We model how the environmental context can contribute to, or exacerbate, the sensing deficiencies.
We demonstrate how a Reinforcement Learning (RL) manager can correct the context-delegation association.
arXiv Detail & Related papers (2023-03-02T14:27:01Z) - Watch-And-Help: A Challenge for Social Perception and Human-AI
Collaboration [116.28433607265573]
We introduce Watch-And-Help (WAH), a challenge for testing social intelligence in AI agents.
In WAH, an AI agent needs to help a human-like agent perform a complex household task efficiently.
We build VirtualHome-Social, a multi-agent household environment, and provide a benchmark including both planning and learning based baselines.
arXiv Detail & Related papers (2020-10-19T21:48:31Z) - AvE: Assistance via Empowerment [77.08882807208461]
We propose a new paradigm for assistance by instead increasing the human's ability to control their environment.
This task-agnostic objective preserves the person's autonomy and ability to achieve any eventual state.
arXiv Detail & Related papers (2020-06-26T04:40:11Z) - Learning to Incentivize Other Learning Agents [73.03133692589532]
We show how to equip RL agents with the ability to give rewards directly to other agents, using a learned incentive function.
Such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games.
Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
arXiv Detail & Related papers (2020-06-10T20:12:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.