Decoupling Exploration and Exploitation in Reinforcement Learning
- URL: http://arxiv.org/abs/2107.08966v1
- Date: Mon, 19 Jul 2021 15:31:02 GMT
- Title: Decoupling Exploration and Exploitation in Reinforcement Learning
- Authors: Lukas Sch\"afer, Filippos Christianos, Josiah Hanna, Stefano V.
Albrecht
- Abstract summary: We propose Decoupled RL (DeRL) which trains separate policies for exploration and exploitation.
We evaluate DeRL algorithms in two sparse-reward environments with multiple types of intrinsic rewards.
- Score: 8.946655323517092
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intrinsic rewards are commonly applied to improve exploration in
reinforcement learning. However, these approaches suffer from instability
caused by non-stationary reward shaping and strong dependency on
hyperparameters. In this work, we propose Decoupled RL (DeRL) which trains
separate policies for exploration and exploitation. DeRL can be applied with
on-policy and off-policy RL algorithms. We evaluate DeRL algorithms in two
sparse-reward environments with multiple types of intrinsic rewards. We show
that DeRL is more robust to scaling and speed of decay of intrinsic rewards and
converges to the same evaluation returns than intrinsically motivated baselines
in fewer interactions.
Related papers
- Hybrid Inverse Reinforcement Learning [34.793570631021005]
inverse reinforcement learning approach to imitation learning is a double-edged sword.
We propose using hybrid RL -- training on a mixture of online and expert data -- to curtail unnecessary exploration.
We derive both model-free and model-based hybrid inverse RL algorithms with strong policy performance guarantees.
arXiv Detail & Related papers (2024-02-13T23:29:09Z) - More Benefits of Being Distributional: Second-Order Bounds for
Reinforcement Learning [58.626683114119906]
We show that Distributional Reinforcement Learning (DistRL) can obtain second-order bounds in both online and offline RL.
Our results are the first second-order bounds for low-rank MDPs and for offline RL.
arXiv Detail & Related papers (2024-02-11T13:25:53Z) - The Virtues of Pessimism in Inverse Reinforcement Learning [38.98656220917943]
Inverse Reinforcement Learning is a powerful framework for learning complex behaviors from expert demonstrations.
It is desirable to reduce the exploration burden by leveraging expert demonstrations in the inner-loop RL.
We consider an alternative approach to speeding up the RL in IRL: emphpessimism, i.e., staying close to the expert's data distribution, instantiated via the use of offline RL algorithms.
arXiv Detail & Related papers (2024-02-04T21:22:29Z) - Leveraging Reward Consistency for Interpretable Feature Discovery in
Reinforcement Learning [69.19840497497503]
It is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents.
We propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents.
We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment.
arXiv Detail & Related papers (2023-09-04T09:09:54Z) - RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$ [12.111848705677142]
We propose RL$3$, a hybrid approach that incorporates action-values, learned per task through traditional RL, in the inputs to meta-RL.
We show that RL$3$ earns greater cumulative reward in the long term, compared to RL$2$, while maintaining data-efficiency in the short term, and generalizes better to out-of-distribution tasks.
arXiv Detail & Related papers (2023-06-28T04:16:16Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs.
We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z) - Active Finite Reward Automaton Inference and Reinforcement Learning
Using Queries and Counterexamples [31.31937554018045]
Deep reinforcement learning (RL) methods require intensive data from the exploration of the environment to achieve satisfactory performance.
We propose a framework that enables an RL agent to reason over its exploration process and distill high-level knowledge for effectively guiding its future explorations.
Specifically, we propose a novel RL algorithm that learns high-level knowledge in the form of a finite reward automaton by using the L* learning algorithm.
arXiv Detail & Related papers (2020-06-28T21:13:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.