Related papers: Human AI interaction loop training: New approach for interactive reinforcement learning

Human AI interaction loop training: New approach for interactive reinforcement learning

URL: http://arxiv.org/abs/2003.04203v1
Date: Mon, 9 Mar 2020 15:27:48 GMT
Title: Human AI interaction loop training: New approach for interactive reinforcement learning
Authors: Neda Navidi
Abstract summary: Reinforcement Learning (RL) in various decision-making tasks of machine learning provides effective results with an agent learning from a stand-alone reward function. RL presents unique challenges with large amounts of environment states and action spaces, as well as in the determination of rewards. Imitation Learning (IL) offers a promising solution for those challenges using a teacher.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement Learning (RL) in various decision-making tasks of machine learning provides effective results with an agent learning from a stand-alone reward function. However, it presents unique challenges with large amounts of environment states and action spaces, as well as in the determination of rewards. This complexity, coming from high dimensionality and continuousness of the environments considered herein, calls for a large number of learning trials to learn about the environment through Reinforcement Learning. Imitation Learning (IL) offers a promising solution for those challenges using a teacher. In IL, the learning process can take advantage of human-sourced assistance and/or control over the agent and environment. A human teacher and an agent learner are considered in this study. The teacher takes part in the agent training towards dealing with the environment, tackling a specific objective, and achieving a predefined goal. Within that paradigm, however, existing IL approaches have the drawback of expecting extensive demonstration information in long-horizon problems. This paper proposes a novel approach combining IL with different types of RL methods, namely state action reward state action (SARSA) and asynchronous advantage actor-critic (A3C) agents, to overcome the problems of both stand-alone systems. It is addressed how to effectively leverage the teacher feedback, be it direct binary or indirect detailed for the agent learner to learn sequential decision-making policies. The results of this study on various OpenAI Gym environments show that this algorithmic method can be incorporated with different combinations, significantly decreases both human endeavor and tedious exploration process.

Related papers

Training a Generally Curious Agent [86.84089201249104]
We present PAPRIKA, a fine-tuning approach that enables language models to develop general decision-making capabilities. Experimental results show that models fine-tuned with PAPRIKA can effectively transfer their learned decision-making capabilities to entirely unseen tasks. These results suggest a promising path towards AI systems that can autonomously solve novel sequential decision-making problems.
arXiv Detail & Related papers (2025-02-24T18:56:58Z)
A Survey On Enhancing Reinforcement Learning in Complex Environments: Insights from Human and LLM Feedback [1.0359008237358598]
This paper focuses on problems of two-folds: firstly, we focus on humans or an LLMs assistance, investigating the ways in which these entities may collaborate with the RL agent in order to foster optimal behavior and expedite learning; secondly, we delve into the research papers dedicated to addressing the intricacies of environments characterized by large observation space.
arXiv Detail & Related papers (2024-11-20T15:52:03Z)
RILe: Reinforced Imitation Learning [60.63173816209543]
RILe is a novel trainer-student system that learns a dynamic reward function based on the student's performance and alignment with expert demonstrations. RILe enables better performance in complex settings where traditional methods falter, outperforming existing methods by 2x in complex simulated robot-locomotion tasks.
arXiv Detail & Related papers (2024-06-12T17:56:31Z)
Social Interpretable Reinforcement Learning [4.242435932138821]
Social Interpretable RL (SIRL) is inspired by social learning principles to improve learning efficiency. Our results on six well-known benchmarks show that SIRL reaches state-of-the-art performance.
arXiv Detail & Related papers (2024-01-27T19:05:21Z)
RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z)
Demonstration-free Autonomous Reinforcement Learning via Implicit and Bidirectional Curriculum [22.32327908453603]
We propose a demonstration-free reinforcement learning algorithm via Implicit and Bi-directional Curriculum (IBC) With an auxiliary agent that is conditionally activated upon learning progress and a bidirectional goal curriculum based on optimal transport, our method outperforms previous methods.
arXiv Detail & Related papers (2023-05-17T04:31:36Z)
Reinforcement Learning in Education: A Multi-Armed Bandit Approach [12.358921226358133]
Reinforcement leaning solves unsupervised problems where agents move through a state-action-reward loop to maximize the overall reward for the agent. The aim of this study was to contextualise and simulate the cumulative reward within an environment for an intervention recommendation problem in the education context.
arXiv Detail & Related papers (2022-11-01T22:47:17Z)
Collaborative Training of Heterogeneous Reinforcement Learning Agents in Environments with Sparse Rewards: What and When to Share? [7.489793155793319]
This work focuses on combining information obtained through intrinsic motivation with the aim of having a more efficient exploration and faster learning. Our results reveal different ways in which a collaborative framework with little additional computational cost can outperform an independent learning process without knowledge sharing.
arXiv Detail & Related papers (2022-02-24T16:15:51Z)
Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space. The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z)
Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world. Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts. This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z)
PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning. We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z)
Forgetful Experience Replay in Hierarchical Reinforcement Learning from Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments. Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations. The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.