Human AI interaction loop training: New approach for interactive
reinforcement learning
- URL: http://arxiv.org/abs/2003.04203v1
- Date: Mon, 9 Mar 2020 15:27:48 GMT
- Title: Human AI interaction loop training: New approach for interactive
reinforcement learning
- Authors: Neda Navidi
- Abstract summary: Reinforcement Learning (RL) in various decision-making tasks of machine learning provides effective results with an agent learning from a stand-alone reward function.
RL presents unique challenges with large amounts of environment states and action spaces, as well as in the determination of rewards.
Imitation Learning (IL) offers a promising solution for those challenges using a teacher.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement Learning (RL) in various decision-making tasks of machine
learning provides effective results with an agent learning from a stand-alone
reward function. However, it presents unique challenges with large amounts of
environment states and action spaces, as well as in the determination of
rewards. This complexity, coming from high dimensionality and continuousness of
the environments considered herein, calls for a large number of learning trials
to learn about the environment through Reinforcement Learning. Imitation
Learning (IL) offers a promising solution for those challenges using a teacher.
In IL, the learning process can take advantage of human-sourced assistance
and/or control over the agent and environment. A human teacher and an agent
learner are considered in this study. The teacher takes part in the agent
training towards dealing with the environment, tackling a specific objective,
and achieving a predefined goal. Within that paradigm, however, existing IL
approaches have the drawback of expecting extensive demonstration information
in long-horizon problems. This paper proposes a novel approach combining IL
with different types of RL methods, namely state action reward state action
(SARSA) and asynchronous advantage actor-critic (A3C) agents, to overcome the
problems of both stand-alone systems. It is addressed how to effectively
leverage the teacher feedback, be it direct binary or indirect detailed for the
agent learner to learn sequential decision-making policies. The results of this
study on various OpenAI Gym environments show that this algorithmic method can
be incorporated with different combinations, significantly decreases both human
endeavor and tedious exploration process.
Related papers
- A Survey On Enhancing Reinforcement Learning in Complex Environments: Insights from Human and LLM Feedback [1.0359008237358598]
This paper focuses on problems of two-folds: firstly, we focus on humans or an LLMs assistance, investigating the ways in which these entities may collaborate with the RL agent in order to foster optimal behavior and expedite learning; secondly, we delve into the research papers dedicated to addressing the intricacies of environments characterized by large observation space.
arXiv Detail & Related papers (2024-11-20T15:52:03Z) - RILe: Reinforced Imitation Learning [60.63173816209543]
RILe is a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward function efficiently.
Our framework produces high-performing policies in high-dimensional tasks where direct imitation fails to replicate complex behaviors.
arXiv Detail & Related papers (2024-06-12T17:56:31Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Demonstration-free Autonomous Reinforcement Learning via Implicit and
Bidirectional Curriculum [22.32327908453603]
We propose a demonstration-free reinforcement learning algorithm via Implicit and Bi-directional Curriculum (IBC)
With an auxiliary agent that is conditionally activated upon learning progress and a bidirectional goal curriculum based on optimal transport, our method outperforms previous methods.
arXiv Detail & Related papers (2023-05-17T04:31:36Z) - Reinforcement Learning in Education: A Multi-Armed Bandit Approach [12.358921226358133]
Reinforcement leaning solves unsupervised problems where agents move through a state-action-reward loop to maximize the overall reward for the agent.
The aim of this study was to contextualise and simulate the cumulative reward within an environment for an intervention recommendation problem in the education context.
arXiv Detail & Related papers (2022-11-01T22:47:17Z) - Collaborative Training of Heterogeneous Reinforcement Learning Agents in
Environments with Sparse Rewards: What and When to Share? [7.489793155793319]
This work focuses on combining information obtained through intrinsic motivation with the aim of having a more efficient exploration and faster learning.
Our results reveal different ways in which a collaborative framework with little additional computational cost can outperform an independent learning process without knowledge sharing.
arXiv Detail & Related papers (2022-02-24T16:15:51Z) - Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space.
The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.