Related papers: HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants

HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants

URL: http://arxiv.org/abs/2509.08494v1
Date: Wed, 10 Sep 2025 11:10:10 GMT
Title: HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants
Authors: Benjamin Sturgeon, Daniel Samuelson, Jacob Haimes, Jacy Reese Anthis,
Abstract summary: We develop the idea of human agency by integrating philosophical and scientific theories of agency with AI-assisted evaluation methods.<n>We develop HumanBench (HAB), a scalable and adaptive benchmark with six dimensions of human agency based on typical AI use cases.
Score: 5.4831302830611195
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As humans delegate more tasks and decisions to artificial intelligence (AI), we risk losing control of our individual and collective futures. Relatively simple algorithmic systems already steer human decision-making, such as social media feed algorithms that lead people to unintentionally and absent-mindedly scroll through engagement-optimized content. In this paper, we develop the idea of human agency by integrating philosophical and scientific theories of agency with AI-assisted evaluation methods: using large language models (LLMs) to simulate and validate user queries and to evaluate AI responses. We develop HumanAgencyBench (HAB), a scalable and adaptive benchmark with six dimensions of human agency based on typical AI use cases. HAB measures the tendency of an AI assistant or agent to Ask Clarifying Questions, Avoid Value Manipulation, Correct Misinformation, Defer Important Decisions, Encourage Learning, and Maintain Social Boundaries. We find low-to-moderate agency support in contemporary LLM-based assistants and substantial variation across system developers and dimensions. For example, while Anthropic LLMs most support human agency overall, they are the least supportive LLMs in terms of Avoid Value Manipulation. Agency support does not appear to consistently result from increasing LLM capabilities or instruction-following behavior (e.g., RLHF), and we encourage a shift towards more robust safety and alignment targets.

Related papers

Training LLM Agents to Empower Humans [67.80021254324294]
We propose a new approach to tuning assistive language models based on maximizing the human's empowerment.<n>Our empowerment-maximizing method, Empower, only requires offline text data.<n>We show that agents trained with Empower increase the success rate of a simulated human programmer on challenging coding questions by an average of 192%.
arXiv Detail & Related papers (2025-10-15T16:09:33Z)
Reversing the Paradigm: Building AI-First Systems with Human Guidance [0.0]
The relationship between humans and artificial intelligence is no longer science fiction.<n>Rather than replacing humans, AI augments tasks, enhancing decisions with data.<n>The future of work is toward AI agents handling tasks autonomously.<n>This paper examines the technological and organizational changes needed to enable responsible adoption of AI-first systems.
arXiv Detail & Related papers (2025-06-13T21:48:44Z)
Modeling AI-Human Collaboration as a Multi-Agent Adaptation [0.0]
We develop an agent-based simulation to formalize AI-human collaboration as a function of a task.<n>We show that in modular tasks, AI often substitutes for humans - delivering higher payoffs unless human expertise is very high.<n>We also show that even "hallucinatory" AI - lacking memory or structure - can improve outcomes when augmenting low-capability humans by helping escape local optima.
arXiv Detail & Related papers (2025-04-29T16:19:53Z)
Do LLMs trust AI regulation? Emerging behaviour of game-theoretic LLM agents [61.132523071109354]
This paper investigates the interplay between AI developers, regulators and users, modelling their strategic choices under different regulatory scenarios.<n>Our research identifies emerging behaviours of strategic AI agents, which tend to adopt more "pessimistic" stances than pure game-theoretic agents.
arXiv Detail & Related papers (2025-04-11T15:41:21Z)
Measurement of LLM's Philosophies of Human Nature [113.47929131143766]
We design the standardized psychological scale specifically targeting large language models (LLM)<n>We show that current LLMs exhibit a systemic lack of trust in humans.<n>We propose a mental loop learning framework, which enables LLM to continuously optimize its value system.
arXiv Detail & Related papers (2025-04-03T06:22:19Z)
Human aversion? Do AI Agents Judge Identity More Harshly Than Performance [0.06554326244334868]
We investigate how AI agents based on large language models assess and integrate human input.<n>We find that the AI system systematically discounts human advice, penalizing human errors more severely than algorithmic errors.
arXiv Detail & Related papers (2025-03-31T02:05:27Z)
Position Paper: Agent AI Towards a Holistic Intelligence [53.35971598180146]
We emphasize developing Agent AI -- an embodied system that integrates large foundation models into agent actions. In this paper, we propose a novel large action model to achieve embodied intelligent behavior, the Agent Foundation Model.
arXiv Detail & Related papers (2024-02-28T16:09:56Z)
The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI) We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents. We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z)
Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety [2.3572498744567127]
We argue that alignment to human intent is insufficient for safe AI systems. We argue that preservation of long-term agency of humans may be a more robust standard.
arXiv Detail & Related papers (2023-05-30T17:14:01Z)
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision [84.31474052176343]
Recent AI-assistant agents, such as ChatGPT, rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback to align the output with human intentions. This dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision. We propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision.
arXiv Detail & Related papers (2023-05-04T17:59:28Z)
A Cognitive Framework for Delegation Between Error-Prone AI and Human Agents [0.0]
We investigate the use of cognitively inspired models of behavior to predict the behavior of both human and AI agents. The predicted behavior is used to delegate control between humans and AI agents through the use of an intermediary entity.
arXiv Detail & Related papers (2022-04-06T15:15:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.