A New Framework for Query Efficient Active Imitation Learning
- URL: http://arxiv.org/abs/1912.13037v1
- Date: Mon, 30 Dec 2019 18:12:27 GMT
- Title: A New Framework for Query Efficient Active Imitation Learning
- Authors: Daniel Hsu
- Abstract summary: There is a human expert knowing the rewards and unsafe states based on his preference and objective, but querying that human expert is expensive.
We propose a new framework for imitation learning (IL) algorithm that actively and interactively learns a model of the user's reward function with efficient queries.
We evaluate the proposed method with simulated human on a state-based 2D navigation task, robotic control tasks and the image-based video games.
- Score: 5.167794607251493
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We seek to align agent policy with human expert behavior in a reinforcement
learning (RL) setting, without any prior knowledge about dynamics, reward
function, and unsafe states. There is a human expert knowing the rewards and
unsafe states based on his preference and objective, but querying that human
expert is expensive. To address this challenge, we propose a new framework for
imitation learning (IL) algorithm that actively and interactively learns a
model of the user's reward function with efficient queries. We build an
adversarial generative model of states and a successor feature (SR) model
trained over transition experience collected by learning policy. Our method
uses these models to select state-action pairs, asking the user to comment on
the optimality or safety, and trains a adversarial neural network to predict
the rewards. Different from previous papers, which are almost all based on
uncertainty sampling, the key idea is to actively and efficiently select
state-action pairs from both on-policy and off-policy experience, by
discriminating the queried (expert) and unqueried (generated) data and
maximizing the efficiency of value function learning. We call this method
adversarial reward query with successor representation. We evaluate the
proposed method with simulated human on a state-based 2D navigation task,
robotic control tasks and the image-based video games, which have
high-dimensional observation and complex state dynamics. The results show that
the proposed method significantly outperforms uncertainty-based methods on
learning reward models, achieving better query efficiency, where the
adversarial discriminator can make the agent learn human behavior more
efficiently and the SR can select states which have stronger impact on value
function. Moreover, the proposed method can also learn to avoid unsafe states
when training the reward model.
Related papers
- Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment [65.15914284008973]
We propose to leverage an Inverse Reinforcement Learning (IRL) technique to simultaneously build an reward model and a policy model.
We show that the proposed algorithms converge to the stationary solutions of the IRL problem.
Our results indicate that it is beneficial to leverage reward learning throughout the entire alignment process.
arXiv Detail & Related papers (2024-05-28T07:11:05Z) - Learning Reward for Robot Skills Using Large Language Models via Self-Alignment [11.639973274337274]
Large Language Models (LLM) contain valuable task-related knowledge that can potentially aid in the learning of reward functions.
We propose a method to learn rewards more efficiently in the absence of humans.
arXiv Detail & Related papers (2024-05-12T04:57:43Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Automatic Evaluation of Excavator Operators using Learned Reward
Functions [5.372817906484557]
We propose a novel strategy for the automatic evaluation of excavator operators.
We take into account the internal dynamics of the excavator and the safety criterion at every time step to evaluate the performance.
For a policy learned using these external reward prediction models, our results demonstrate safer solutions.
arXiv Detail & Related papers (2022-11-15T06:58:00Z) - Value-Consistent Representation Learning for Data-Efficient
Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making.
Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values.
It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Learning Human Rewards by Inferring Their Latent Intelligence Levels in
Multi-Agent Games: A Theory-of-Mind Approach with Application to Driving Data [18.750834997334664]
We argue that humans are bounded rational and have different intelligence levels when reasoning about others' decision-making process.
We propose a new multi-agent Inverse Reinforcement Learning framework that reasons about humans' latent intelligence levels during learning.
arXiv Detail & Related papers (2021-03-07T07:48:31Z) - Generative Inverse Deep Reinforcement Learning for Online Recommendation [62.09946317831129]
We propose a novel inverse reinforcement learning approach, namely InvRec, for online recommendation.
InvRec extracts the reward function from user's behaviors automatically, for online recommendation.
arXiv Detail & Related papers (2020-11-04T12:12:25Z) - REMAX: Relational Representation for Multi-Agent Exploration [13.363887960136102]
We propose a learning-based exploration strategy to generate the initial states of a game.
We demonstrate that our method improves the training and performance of the MARL model more than the existing exploration methods.
arXiv Detail & Related papers (2020-08-12T10:23:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.