Towards Improving Exploration in Self-Imitation Learning using Intrinsic
Motivation
- URL: http://arxiv.org/abs/2211.16838v1
- Date: Wed, 30 Nov 2022 09:18:59 GMT
- Title: Towards Improving Exploration in Self-Imitation Learning using Intrinsic
Motivation
- Authors: Alain Andres, Esther Villar-Rodriguez and Javier Del Ser
- Abstract summary: Reinforcement Learning has emerged as a strong alternative to solve optimization tasks efficiently.
The use of these algorithms highly depends on the feedback signals provided by the environment in charge of informing about how good (or bad) the decisions made by the learned agent are.
In this work intrinsic motivation is used to encourage the agent to explore the environment based on its curiosity, whereas imitation learning allows repeating the most promising experiences to accelerate the learning process.
- Score: 7.489793155793319
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement Learning has emerged as a strong alternative to solve
optimization tasks efficiently. The use of these algorithms highly depends on
the feedback signals provided by the environment in charge of informing about
how good (or bad) the decisions made by the learned agent are. Unfortunately,
in a broad range of problems the design of a good reward function is not
trivial, so in such cases sparse reward signals are instead adopted. The lack
of a dense reward function poses new challenges, mostly related to exploration.
Imitation Learning has addressed those problems by leveraging demonstrations
from experts. In the absence of an expert (and its subsequent demonstrations),
an option is to prioritize well-suited exploration experiences collected by the
agent in order to bootstrap its learning process with good exploration
behaviors. However, this solution highly depends on the ability of the agent to
discover such trajectories in the early stages of its learning process. To
tackle this issue, we propose to combine imitation learning with intrinsic
motivation, two of the most widely adopted techniques to address problems with
sparse reward. In this work intrinsic motivation is used to encourage the agent
to explore the environment based on its curiosity, whereas imitation learning
allows repeating the most promising experiences to accelerate the learning
process. This combination is shown to yield an improved performance and better
generalization in procedurally-generated environments, outperforming previously
reported self-imitation learning methods and achieving equal or better sample
efficiency with respect to intrinsic motivation in isolation.
Related papers
- Efficient Diversity-based Experience Replay for Deep Reinforcement Learning [14.96744975805832]
This paper proposes a novel approach, diversity-based experience replay (DBER), which leverages the deterministic point process to prioritize diverse samples in state realizations.
We conducted extensive experiments on Robotic Manipulation tasks in MuJoCo, Atari games, and realistic in-door environments in Habitat.
arXiv Detail & Related papers (2024-10-27T15:51:27Z) - RILe: Reinforced Imitation Learning [60.63173816209543]
RILe is a novel trainer-student system that learns a dynamic reward function based on the student's performance and alignment with expert demonstrations.
RILe enables better performance in complex settings where traditional methods falter, outperforming existing methods by 2x in complex simulated robot-locomotion tasks.
arXiv Detail & Related papers (2024-06-12T17:56:31Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Self-supervised network distillation: an effective approach to exploration in sparse reward environments [0.0]
Reinforcement learning can train an agent to behave in an environment according to a predesigned reward function.
The solution to such a problem may be to equip the agent with an intrinsic motivation that will provide informed exploration.
We present Self-supervised Network Distillation (SND), a class of intrinsic motivation algorithms based on the distillation error as a novelty indicator.
arXiv Detail & Related papers (2023-02-22T18:58:09Z) - Curiosity-driven Exploration in Sparse-reward Multi-agent Reinforcement
Learning [0.6526824510982799]
In this article, we discuss the limitations of intrinsic curiosity module in sparse-reward multi-agent reinforcement learning.
We propose a method called I-Go-Explore that combines the intrinsic curiosity module with the Go-Explore framework to alleviate the detachment problem.
arXiv Detail & Related papers (2023-02-21T17:00:05Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Collaborative Training of Heterogeneous Reinforcement Learning Agents in
Environments with Sparse Rewards: What and When to Share? [7.489793155793319]
This work focuses on combining information obtained through intrinsic motivation with the aim of having a more efficient exploration and faster learning.
Our results reveal different ways in which a collaborative framework with little additional computational cost can outperform an independent learning process without knowledge sharing.
arXiv Detail & Related papers (2022-02-24T16:15:51Z) - Learning from Guided Play: A Scheduled Hierarchical Approach for
Improving Exploration in Adversarial Imitation Learning [7.51557557629519]
We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of, in addition to a main task, multiple auxiliary tasks.
This affords many benefits: learning efficiency is improved for main tasks with challenging bottleneck transitions, expert data becomes reusable between tasks, and transfer learning through the reuse of learned auxiliary task models becomes possible.
arXiv Detail & Related papers (2021-12-16T14:58:08Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Generative Inverse Deep Reinforcement Learning for Online Recommendation [62.09946317831129]
We propose a novel inverse reinforcement learning approach, namely InvRec, for online recommendation.
InvRec extracts the reward function from user's behaviors automatically, for online recommendation.
arXiv Detail & Related papers (2020-11-04T12:12:25Z) - Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning.
We propose 'Adaptive Insubordination' (ADVISOR) to address this gap.
ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.