RL agents Implicitly Learning Human Preferences
- URL: http://arxiv.org/abs/2002.06137v1
- Date: Fri, 14 Feb 2020 17:42:50 GMT
- Title: RL agents Implicitly Learning Human Preferences
- Authors: Nevan Wichers
- Abstract summary: We show that RL agents implicitly learn the preferences of humans in their environment.
Training a classifier to predict if a simulated human's preferences are fulfilled based on the activations of a RL agent's neural network gets.93 AUC.
- Score: 1.52292571922932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the real world, RL agents should be rewarded for fulfilling human
preferences. We show that RL agents implicitly learn the preferences of humans
in their environment. Training a classifier to predict if a simulated human's
preferences are fulfilled based on the activations of a RL agent's neural
network gets .93 AUC. Training a classifier on the raw environment state gets
only .8 AUC. Training the classifier off of the RL agent's activations also
does much better than training off of activations from an autoencoder. The
human preference classifier can be used as the reward function of an RL agent
to make RL agent more beneficial for humans.
Related papers
- Learning Human-Like RL Agents Through Trajectory Optimization With Action Quantization [20.732922711530527]
We introduce Macro Action Quantization (MAQ), a human-like reinforcement learning framework that distills human demonstrations into macro actions.<n>Experiments on D4RL Adroit benchmarks show that MAQ significantly improves human-likeness, increasing trajectory similarity scores, and achieving the highest human-likeness rankings among all RL agents.<n>Our results also demonstrate that MAQ can be easily integrated into various off-the-shelf RL algorithms, opening a promising direction for learning human-like RL agents.
arXiv Detail & Related papers (2025-11-19T02:59:47Z) - LAMeTA: Intent-Aware Agentic Network Optimization via a Large AI Model-Empowered Two-Stage Approach [68.198383438396]
We present LAMeTA, a Large AI Model (LAM)-empowered Two-stage Approach for intent-aware agentic network optimization.<n>First, we propose Intent-oriented Knowledge Distillation (IoKD), which efficiently distills intent-understanding capabilities.<n>Second, we develop Symbiotic Reinforcement Learning (SRL), integrating E-LAMs with a policy-based DRL framework.
arXiv Detail & Related papers (2025-05-18T05:59:16Z) - RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning [125.65034908728828]
Training large language models (LLMs) as interactive agents presents unique challenges.
While reinforcement learning has enabled progress in static tasks, multi-turn agent RL training remains underexplored.
We propose StarPO, a general framework for trajectory-level agent RL, and introduce RAGEN, a modular system for training and evaluating LLM agents.
arXiv Detail & Related papers (2025-04-24T17:57:08Z) - DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning [61.10299147201369]
This paper introduces a novel autonomous RL approach, called DigiRL, for training in-the-wild device control agents.
We build a scalable and parallelizable Android learning environment equipped with a VLM-based evaluator.
We demonstrate the effectiveness of DigiRL using the Android-in-the-Wild dataset, where our 1.3B VLM trained with RL achieves a 49.5% absolute improvement.
arXiv Detail & Related papers (2024-06-14T17:49:55Z) - Ego-Foresight: Agent Visuomotor Prediction as Regularization for RL [34.6883445484835]
Ego-Foresight is a self-supervised method for disentangling agent and environment based on motion and prediction.
We show that visuomotor prediction of the agent provides regularization to the RL algorithm, by encouraging the actions to stay within predictable bounds.
We integrate Ego-Foresight with a model-free RL algorithm to solve simulated robotic manipulation tasks, showing an average improvement of 23% in efficiency and 8% in performance.
arXiv Detail & Related papers (2024-05-27T13:32:43Z) - REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world.
Recent methods aim to mitigate misalignment by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Leveraging Reward Consistency for Interpretable Feature Discovery in
Reinforcement Learning [69.19840497497503]
It is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents.
We propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents.
We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment.
arXiv Detail & Related papers (2023-09-04T09:09:54Z) - DIP-RL: Demonstration-Inferred Preference Learning in Minecraft [0.5669790037378094]
In machine learning, an algorithmic agent learns to interact with an environment while receiving feedback in the form of a reward signal.
We present Demonstration-Inferred Preference Reinforcement Learning (DIP-RL), an algorithm that leverages human demonstrations in three distinct ways.
We evaluate DIP-RL in a tree-chopping task in Minecraft.
arXiv Detail & Related papers (2023-07-22T20:05:31Z) - Int-HRL: Towards Intention-based Hierarchical Reinforcement Learning [23.062590084580542]
Int-HRL: Hierarchical RL with intention-based sub-goals that are inferred from human eye gaze.
Our evaluations show that replacing hand-crafted sub-goals with automatically extracted intentions leads to a HRL agent that is significantly more sample efficient than previous methods.
arXiv Detail & Related papers (2023-06-20T12:12:16Z) - Can Agents Run Relay Race with Strangers? Generalization of RL to
Out-of-Distribution Trajectories [88.08381083207449]
We show the prevalence of emphgeneralization failure on controllable states from stranger agents.
We propose a novel method called Self-Trajectory Augmentation (STA), which will reset the environment to the agent's old states according to the Q function during training.
arXiv Detail & Related papers (2023-04-26T10:12:12Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Beyond Tabula Rasa: Reincarnating Reinforcement Learning [37.201451908129386]
Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research.
We present reincarnating RL as an alternative workflow, where prior computational work is reused or transferred between design iterations of an RL agent.
We find that existing approaches fail in this setting and propose a simple algorithm to address their limitations.
arXiv Detail & Related papers (2022-06-03T15:11:10Z) - DDPG car-following model with real-world human driving experience in
CARLA [0.0]
We propose a two-stage Deep Reinforcement Learning (DRL) method, that learns from real-world human driving to achieve performance that is superior to the pure DRL agent.
For evaluation, we designed different real-world driving scenarios to compare the proposed two-stage DRL agent with the pure DRL agent.
arXiv Detail & Related papers (2021-12-29T15:22:31Z) - Learning to Prune Deep Neural Networks via Reinforcement Learning [64.85939668308966]
PuRL is a deep reinforcement learning based algorithm for pruning neural networks.
It achieves sparsity and accuracy comparable to current state-of-the-art methods.
arXiv Detail & Related papers (2020-07-09T13:06:07Z) - Distributed Reinforcement Learning for Cooperative Multi-Robot Object
Manipulation [53.262360083572005]
We consider solving a cooperative multi-robot object manipulation task using reinforcement learning (RL)
We propose two distributed multi-agent RL approaches: distributed approximate RL (DA-RL) and game-theoretic RL (GT-RL)
Although we focus on a small system of two agents in this paper, both DA-RL and GT-RL apply to general multi-agent systems, and are expected to scale well to large systems.
arXiv Detail & Related papers (2020-03-21T00:43:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.