Training LLM Agents to Empower Humans
- URL: http://arxiv.org/abs/2510.13709v2
- Date: Thu, 16 Oct 2025 03:39:31 GMT
- Title: Training LLM Agents to Empower Humans
- Authors: Evan Ellis, Vivek Myers, Jens Tuyls, Sergey Levine, Anca Dragan, Benjamin Eysenbach,
- Abstract summary: We propose a new approach to tuning assistive language models based on maximizing the human's empowerment.<n>Our empowerment-maximizing method, Empower, only requires offline text data.<n>We show that agents trained with Empower increase the success rate of a simulated human programmer on challenging coding questions by an average of 192%.
- Score: 67.80021254324294
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Assistive agents should not only take actions on behalf of a human, but also step out of the way and cede control when there are important decisions to be made. However, current methods for building assistive agents, whether via mimicking expert humans or via RL finetuning on an inferred reward, often encourage agents to complete tasks on their own rather than truly assisting the human attain their objectives. Additionally, these methods often require costly explicit human feedback to provide a training signal. We propose a new approach to tuning assistive language models based on maximizing the human's empowerment, their ability to effect desired changes in the environment. Our empowerment-maximizing method, Empower, only requires offline text data, providing a self-supervised method for fine-tuning language models to better assist humans. To study the efficacy of our approach, we conducted an 18-person user study comparing our empowerment assistant with a strong baseline. Participants preferred our assistant 78% of the time (p=0.015), with a 31% higher acceptance rate and 38% fewer suggestions. Additionally, we introduce a new environment for evaluating multi-turn code assistance using simulated humans. Using this environment, we show that agents trained with Empower increase the success rate of a simulated human programmer on challenging coding questions by an average of 192% over an SFT baseline. With this empowerment objective, we provide a framework for useful aligned AI agents at scale using only offline data without the need for any additional human feedback or verifiable rewards.
Related papers
- HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants [5.4831302830611195]
We develop the idea of human agency by integrating philosophical and scientific theories of agency with AI-assisted evaluation methods.<n>We develop HumanBench (HAB), a scalable and adaptive benchmark with six dimensions of human agency based on typical AI use cases.
arXiv Detail & Related papers (2025-09-10T11:10:10Z) - GUIDE: Real-Time Human-Shaped Agents [4.676987516944155]
We introduce GUIDE, a framework for real-time human-guided reinforcement learning.
With only 10 minutes of human feedback, our algorithm achieves up to 30% increase in success rate compared to its RL baseline.
arXiv Detail & Related papers (2024-10-19T18:59:39Z) - Enabling Language Models to Implicitly Learn Self-Improvement [49.16868302881804]
Large Language Models (LLMs) have demonstrated remarkable capabilities in open-ended text generation tasks.
We propose an ImPlicit Self-ImprovemenT (PIT) framework that implicitly learns the improvement goal from human preference data.
arXiv Detail & Related papers (2023-10-02T04:29:40Z) - Principle-Driven Self-Alignment of Language Models from Scratch with
Minimal Human Supervision [84.31474052176343]
Recent AI-assistant agents, such as ChatGPT, rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback to align the output with human intentions.
This dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision.
We propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision.
arXiv Detail & Related papers (2023-05-04T17:59:28Z) - Improving Multimodal Interactive Agents with Reinforcement Learning from
Human Feedback [16.268581985382433]
An important goal in artificial intelligence is to create agents that can both interact naturally with humans and learn from their feedback.
Here we demonstrate how to use reinforcement learning from human feedback to improve upon simulated, embodied agents.
arXiv Detail & Related papers (2022-11-21T16:00:31Z) - Human Decision Makings on Curriculum Reinforcement Learning with
Difficulty Adjustment [52.07473934146584]
We guide the curriculum reinforcement learning results towards a preferred performance level that is neither too hard nor too easy via learning from the human decision process.
Our system is highly parallelizable, making it possible for a human to train large-scale reinforcement learning applications.
It shows reinforcement learning performance can successfully adjust in sync with the human desired difficulty level.
arXiv Detail & Related papers (2022-08-04T23:53:51Z) - Learning to Assist Agents by Observing Them [41.74498230885008]
We introduce methods where the capability to create a representation of the behavior is first pre-trained with offline data.
We test the setting in a gridworld where the helper agent has the capability to manipulate the environment of the assisted artificial agents.
arXiv Detail & Related papers (2021-10-04T10:38:59Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - AvE: Assistance via Empowerment [77.08882807208461]
We propose a new paradigm for assistance by instead increasing the human's ability to control their environment.
This task-agnostic objective preserves the person's autonomy and ability to achieve any eventual state.
arXiv Detail & Related papers (2020-06-26T04:40:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.