Human AI interaction loop training: New approach for interactive
reinforcement learning
- URL: http://arxiv.org/abs/2003.04203v1
- Date: Mon, 9 Mar 2020 15:27:48 GMT
- Title: Human AI interaction loop training: New approach for interactive
reinforcement learning
- Authors: Neda Navidi
- Abstract summary: Reinforcement Learning (RL) in various decision-making tasks of machine learning provides effective results with an agent learning from a stand-alone reward function.
RL presents unique challenges with large amounts of environment states and action spaces, as well as in the determination of rewards.
Imitation Learning (IL) offers a promising solution for those challenges using a teacher.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement Learning (RL) in various decision-making tasks of machine
learning provides effective results with an agent learning from a stand-alone
reward function. However, it presents unique challenges with large amounts of
environment states and action spaces, as well as in the determination of
rewards. This complexity, coming from high dimensionality and continuousness of
the environments considered herein, calls for a large number of learning trials
to learn about the environment through Reinforcement Learning. Imitation
Learning (IL) offers a promising solution for those challenges using a teacher.
In IL, the learning process can take advantage of human-sourced assistance
and/or control over the agent and environment. A human teacher and an agent
learner are considered in this study. The teacher takes part in the agent
training towards dealing with the environment, tackling a specific objective,
and achieving a predefined goal. Within that paradigm, however, existing IL
approaches have the drawback of expecting extensive demonstration information
in long-horizon problems. This paper proposes a novel approach combining IL
with different types of RL methods, namely state action reward state action
(SARSA) and asynchronous advantage actor-critic (A3C) agents, to overcome the
problems of both stand-alone systems. It is addressed how to effectively
leverage the teacher feedback, be it direct binary or indirect detailed for the
agent learner to learn sequential decision-making policies. The results of this
study on various OpenAI Gym environments show that this algorithmic method can
be incorporated with different combinations, significantly decreases both human
endeavor and tedious exploration process.
Related papers
- RILe: Reinforced Imitation Learning [60.63173816209543]
RILe is a novel trainer-student system that learns a dynamic reward function based on the student's performance and alignment with expert demonstrations.
RILe enables better performance in complex settings where traditional methods falter, outperforming existing methods by 2x in complex simulated robot-locomotion tasks.
arXiv Detail & Related papers (2024-06-12T17:56:31Z) - Social Interpretable Reinforcement Learning [4.242435932138821]
Social Interpretable RL (SIRL) is inspired by social learning principles to improve learning efficiency.
Our results on six well-known benchmarks show that SIRL reaches state-of-the-art performance.
arXiv Detail & Related papers (2024-01-27T19:05:21Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Demonstration-free Autonomous Reinforcement Learning via Implicit and
Bidirectional Curriculum [22.32327908453603]
We propose a demonstration-free reinforcement learning algorithm via Implicit and Bi-directional Curriculum (IBC)
With an auxiliary agent that is conditionally activated upon learning progress and a bidirectional goal curriculum based on optimal transport, our method outperforms previous methods.
arXiv Detail & Related papers (2023-05-17T04:31:36Z) - Reinforcement Learning in Education: A Multi-Armed Bandit Approach [12.358921226358133]
Reinforcement leaning solves unsupervised problems where agents move through a state-action-reward loop to maximize the overall reward for the agent.
The aim of this study was to contextualise and simulate the cumulative reward within an environment for an intervention recommendation problem in the education context.
arXiv Detail & Related papers (2022-11-01T22:47:17Z) - Collaborative Training of Heterogeneous Reinforcement Learning Agents in
Environments with Sparse Rewards: What and When to Share? [7.489793155793319]
This work focuses on combining information obtained through intrinsic motivation with the aim of having a more efficient exploration and faster learning.
Our results reveal different ways in which a collaborative framework with little additional computational cost can outperform an independent learning process without knowledge sharing.
arXiv Detail & Related papers (2022-02-24T16:15:51Z) - Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space.
The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.