FRESH: Interactive Reward Shaping in High-Dimensional State Spaces using
Human Feedback
- URL: http://arxiv.org/abs/2001.06781v1
- Date: Sun, 19 Jan 2020 06:07:20 GMT
- Title: FRESH: Interactive Reward Shaping in High-Dimensional State Spaces using
Human Feedback
- Authors: Baicen Xiao, Qifan Lu, Bhaskar Ramasubramanian, Andrew Clark, Linda
Bushnell, Radha Poovendran
- Abstract summary: Reinforcement learning has been successful in training autonomous agents to accomplish goals in complex environments.
Human players often find it easier to obtain higher rewards in some environments than reinforcement learning algorithms.
This is especially true of high-dimensional state spaces where the reward obtained by the agent is sparse or extremely delayed.
- Score: 9.548547582558662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning has been successful in training autonomous agents to
accomplish goals in complex environments. Although this has been adapted to
multiple settings, including robotics and computer games, human players often
find it easier to obtain higher rewards in some environments than reinforcement
learning algorithms. This is especially true of high-dimensional state spaces
where the reward obtained by the agent is sparse or extremely delayed. In this
paper, we seek to effectively integrate feedback signals supplied by a human
operator with deep reinforcement learning algorithms in high-dimensional state
spaces. We call this FRESH (Feedback-based REward SHaping). During training, a
human operator is presented with trajectories from a replay buffer and then
provides feedback on states and actions in the trajectory. In order to
generalize feedback signals provided by the human operator to previously unseen
states and actions at test-time, we use a feedback neural network. We use an
ensemble of neural networks with a shared network architecture to represent
model uncertainty and the confidence of the neural network in its output. The
output of the feedback neural network is converted to a shaping reward that is
augmented to the reward provided by the environment. We evaluate our approach
on the Bowling and Skiing Atari games in the arcade learning environment.
Although human experts have been able to achieve high scores in these
environments, state-of-the-art deep learning algorithms perform poorly. We
observe that FRESH is able to achieve much higher scores than state-of-the-art
deep learning algorithms in both environments. FRESH also achieves a 21.4%
higher score than a human expert in Bowling and does as well as a human expert
in Skiing.
Related papers
- Autonomous Robotic Reinforcement Learning with Asynchronous Human
Feedback [27.223725464754853]
GEAR enables robots to be placed in real-world environments and left to train autonomously without interruption.
System streams robot experience to a web interface only requiring occasional asynchronous feedback from remote, crowdsourced, non-expert humans.
arXiv Detail & Related papers (2023-10-31T16:43:56Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Reinforcement Learning in an Adaptable Chess Environment for Detecting
Human-understandable Concepts [0.0]
We show a method for probing which concepts self-learning agents internalise in the course of their training.
For demonstration, we use a chess playing agent in a fast and light environment developed specifically to be suitable for research groups.
arXiv Detail & Related papers (2022-11-10T11:48:10Z) - Learning from humans: combining imitation and deep reinforcement
learning to accomplish human-level performance on a virtual foraging task [6.263481844384228]
We develop a method to learn bio-inspired foraging policies using human data.
We conduct an experiment where humans are virtually immersed in an open field foraging environment and are trained to collect the highest amount of rewards.
arXiv Detail & Related papers (2022-03-11T20:52:30Z) - ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement
Learning [91.58711082348293]
Reinforcement learning from online user feedback on the system's performance presents a natural solution to this problem.
This approach tends to require a large amount of human-in-the-loop training data, especially when feedback is sparse.
We propose a hierarchical solution that learns efficiently from sparse user feedback.
arXiv Detail & Related papers (2022-02-05T02:01:19Z) - Backprop-Free Reinforcement Learning with Active Neural Generative
Coding [84.11376568625353]
We propose a computational framework for learning action-driven generative models without backpropagation of errors (backprop) in dynamic environments.
We develop an intelligent agent that operates even with sparse rewards, drawing inspiration from the cognitive theory of planning as inference.
The robust performance of our agent offers promising evidence that a backprop-free approach for neural inference and learning can drive goal-directed behavior.
arXiv Detail & Related papers (2021-07-10T19:02:27Z) - Learning What To Do by Simulating the Past [76.86449554580291]
We show that by combining a learned feature encoder with learned inverse models, we can enable agents to simulate human actions backwards in time to infer what they must have done.
The resulting algorithm is able to reproduce a specific skill in MuJoCo environments given a single state sampled from the optimal policy for that skill.
arXiv Detail & Related papers (2021-04-08T17:43:29Z) - Truly Sparse Neural Networks at Scale [2.2860412844991655]
We train the largest neural network ever trained in terms of representational power -- reaching the bat brain size.
Our approach has state-of-the-art performance while opening the path for an environmentally friendly artificial intelligence era.
arXiv Detail & Related papers (2021-02-02T20:06:47Z) - Semi-supervised reward learning for offline reinforcement learning [71.6909757718301]
Training agents usually requires reward functions, but rewards are seldom available in practice and their engineering is challenging and laborious.
We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data.
In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards.
arXiv Detail & Related papers (2020-12-12T20:06:15Z) - Learning Intrinsic Symbolic Rewards in Reinforcement Learning [7.101885582663675]
We present a method that discovers dense rewards in the form of low-dimensional symbolic trees.
We show that the discovered dense rewards are an effective signal for an RL policy to solve the benchmark tasks.
arXiv Detail & Related papers (2020-10-08T00:02:46Z) - Learning Affordance Landscapes for Interaction Exploration in 3D
Environments [101.90004767771897]
Embodied agents must be able to master how their environment works.
We introduce a reinforcement learning approach for exploration for interaction.
We demonstrate our idea with AI2-iTHOR.
arXiv Detail & Related papers (2020-08-21T00:29:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.