Related papers: FRESH: Interactive Reward Shaping in High-Dimensional State Spaces using Human Feedback

FRESH: Interactive Reward Shaping in High-Dimensional State Spaces using Human Feedback

URL: http://arxiv.org/abs/2001.06781v1
Date: Sun, 19 Jan 2020 06:07:20 GMT
Title: FRESH: Interactive Reward Shaping in High-Dimensional State Spaces using Human Feedback
Authors: Baicen Xiao, Qifan Lu, Bhaskar Ramasubramanian, Andrew Clark, Linda Bushnell, Radha Poovendran
Abstract summary: Reinforcement learning has been successful in training autonomous agents to accomplish goals in complex environments. Human players often find it easier to obtain higher rewards in some environments than reinforcement learning algorithms. This is especially true of high-dimensional state spaces where the reward obtained by the agent is sparse or extremely delayed.
Score: 9.548547582558662
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning has been successful in training autonomous agents to accomplish goals in complex environments. Although this has been adapted to multiple settings, including robotics and computer games, human players often find it easier to obtain higher rewards in some environments than reinforcement learning algorithms. This is especially true of high-dimensional state spaces where the reward obtained by the agent is sparse or extremely delayed. In this paper, we seek to effectively integrate feedback signals supplied by a human operator with deep reinforcement learning algorithms in high-dimensional state spaces. We call this FRESH (Feedback-based REward SHaping). During training, a human operator is presented with trajectories from a replay buffer and then provides feedback on states and actions in the trajectory. In order to generalize feedback signals provided by the human operator to previously unseen states and actions at test-time, we use a feedback neural network. We use an ensemble of neural networks with a shared network architecture to represent model uncertainty and the confidence of the neural network in its output. The output of the feedback neural network is converted to a shaping reward that is augmented to the reward provided by the environment. We evaluate our approach on the Bowling and Skiing Atari games in the arcade learning environment. Although human experts have been able to achieve high scores in these environments, state-of-the-art deep learning algorithms perform poorly. We observe that FRESH is able to achieve much higher scores than state-of-the-art deep learning algorithms in both environments. FRESH also achieves a 21.4% higher score than a human expert in Bowling and does as well as a human expert in Skiing.

Related papers

Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback [26.585985828583304]
We propose a novel reinforcement learning from implicit human feedback (RLIHF) framework that utilizes non-invasive electroencephalography (EEG) signals.<n>We evaluate our approach in a simulation environment built on the MuJoCo physics engine, using a Kinova Gen2 robotic arm.<n>Results show that agents trained with decoded EEG feedback achieve performance comparable to those trained with dense, manually designed rewards.
arXiv Detail & Related papers (2025-07-17T14:35:12Z)
Proximal Control of UAVs with Federated Learning for Human-Robot Collaborative Domains [3.1043493260209805]
This work proposes an action recognition and control approach based on Long Short-Term Memory (LSTM) Deep Neural Networks with two layers in association with three densely connected layers and Federated Learning (FL) embedded in multiple drones. Experiments with real robots achieved an accuracy greater than 96%.
arXiv Detail & Related papers (2024-12-03T21:57:04Z)
Autonomous Robotic Reinforcement Learning with Asynchronous Human Feedback [27.223725464754853]
GEAR enables robots to be placed in real-world environments and left to train autonomously without interruption. System streams robot experience to a web interface only requiring occasional asynchronous feedback from remote, crowdsourced, non-expert humans.
arXiv Detail & Related papers (2023-10-31T16:43:56Z)
Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on. In this work, we propose MEDAL++, a novel design for self-improving robotic systems. The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z)
Reinforcement Learning in an Adaptable Chess Environment for Detecting Human-understandable Concepts [0.0]
We show a method for probing which concepts self-learning agents internalise in the course of their training. For demonstration, we use a chess playing agent in a fast and light environment developed specifically to be suitable for research groups.
arXiv Detail & Related papers (2022-11-10T11:48:10Z)
Learning from humans: combining imitation and deep reinforcement learning to accomplish human-level performance on a virtual foraging task [6.263481844384228]
We develop a method to learn bio-inspired foraging policies using human data. We conduct an experiment where humans are virtually immersed in an open field foraging environment and are trained to collect the highest amount of rewards.
arXiv Detail & Related papers (2022-03-11T20:52:30Z)
ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement Learning [91.58711082348293]
Reinforcement learning from online user feedback on the system's performance presents a natural solution to this problem. This approach tends to require a large amount of human-in-the-loop training data, especially when feedback is sparse. We propose a hierarchical solution that learns efficiently from sparse user feedback.
arXiv Detail & Related papers (2022-02-05T02:01:19Z)
Backprop-Free Reinforcement Learning with Active Neural Generative Coding [84.11376568625353]
We propose a computational framework for learning action-driven generative models without backpropagation of errors (backprop) in dynamic environments. We develop an intelligent agent that operates even with sparse rewards, drawing inspiration from the cognitive theory of planning as inference. The robust performance of our agent offers promising evidence that a backprop-free approach for neural inference and learning can drive goal-directed behavior.
arXiv Detail & Related papers (2021-07-10T19:02:27Z)
Learning What To Do by Simulating the Past [76.86449554580291]
We show that by combining a learned feature encoder with learned inverse models, we can enable agents to simulate human actions backwards in time to infer what they must have done. The resulting algorithm is able to reproduce a specific skill in MuJoCo environments given a single state sampled from the optimal policy for that skill.
arXiv Detail & Related papers (2021-04-08T17:43:29Z)
Truly Sparse Neural Networks at Scale [2.2860412844991655]
We train the largest neural network ever trained in terms of representational power -- reaching the bat brain size. Our approach has state-of-the-art performance while opening the path for an environmentally friendly artificial intelligence era.
arXiv Detail & Related papers (2021-02-02T20:06:47Z)
Semi-supervised reward learning for offline reinforcement learning [71.6909757718301]
Training agents usually requires reward functions, but rewards are seldom available in practice and their engineering is challenging and laborious. We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data. In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards.
arXiv Detail & Related papers (2020-12-12T20:06:15Z)
Learning Intrinsic Symbolic Rewards in Reinforcement Learning [7.101885582663675]
We present a method that discovers dense rewards in the form of low-dimensional symbolic trees. We show that the discovered dense rewards are an effective signal for an RL policy to solve the benchmark tasks.
arXiv Detail & Related papers (2020-10-08T00:02:46Z)
Learning Affordance Landscapes for Interaction Exploration in 3D Environments [101.90004767771897]
Embodied agents must be able to master how their environment works. We introduce a reinforcement learning approach for exploration for interaction. We demonstrate our idea with AI2-iTHOR.
arXiv Detail & Related papers (2020-08-21T00:29:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.