Munchausen Reinforcement Learning
- URL: http://arxiv.org/abs/2007.14430v3
- Date: Wed, 4 Nov 2020 16:46:15 GMT
- Title: Munchausen Reinforcement Learning
- Authors: Nino Vieillard, Olivier Pietquin, Matthieu Geist
- Abstract summary: bootstrapping is a core mechanism in Reinforcement Learning (RL)
We show that slightly modifying Deep Q-Network (DQN) in that way provides an agent that is competitive with distributional methods on Atari games.
We provide strong theoretical insights on what happens under the hood -- implicit Kullback-Leibler regularization and increase of the action-gap.
- Score: 50.396037940989146
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Bootstrapping is a core mechanism in Reinforcement Learning (RL). Most
algorithms, based on temporal differences, replace the true value of a
transiting state by their current estimate of this value. Yet, another estimate
could be leveraged to bootstrap RL: the current policy. Our core contribution
stands in a very simple idea: adding the scaled log-policy to the immediate
reward. We show that slightly modifying Deep Q-Network (DQN) in that way
provides an agent that is competitive with distributional methods on Atari
games, without making use of distributional RL, n-step returns or prioritized
replay. To demonstrate the versatility of this idea, we also use it together
with an Implicit Quantile Network (IQN). The resulting agent outperforms
Rainbow on Atari, installing a new State of the Art with very little
modifications to the original algorithm. To add to this empirical study, we
provide strong theoretical insights on what happens under the hood -- implicit
Kullback-Leibler regularization and increase of the action-gap.
Related papers
- Walking the Values in Bayesian Inverse Reinforcement Learning [66.68997022043075]
Key challenge in Bayesian IRL is bridging the computational gap between the hypothesis space of possible rewards and the likelihood.
We propose ValueWalk - a new Markov chain Monte Carlo method based on this insight.
arXiv Detail & Related papers (2024-07-15T17:59:52Z) - Leveraging Reward Consistency for Interpretable Feature Discovery in
Reinforcement Learning [69.19840497497503]
It is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents.
We propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents.
We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment.
arXiv Detail & Related papers (2023-09-04T09:09:54Z) - IGN : Implicit Generative Networks [5.394800220750409]
We build recent advances in distributional reinforcement learning to give a state-of-art distributional variant of the model based on the IQN.
We demonstrate improved performance on our baseline dataset - 57 Atari 2600 games in the ALE.
Also, we use our algorithm to show the state-of-art training performance of risk-sensitive policies in Atari games with the policy optimization and evaluation.
arXiv Detail & Related papers (2022-06-13T00:02:23Z) - Beyond Tabula Rasa: Reincarnating Reinforcement Learning [37.201451908129386]
Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research.
We present reincarnating RL as an alternative workflow, where prior computational work is reused or transferred between design iterations of an RL agent.
We find that existing approaches fail in this setting and propose a simple algorithm to address their limitations.
arXiv Detail & Related papers (2022-06-03T15:11:10Z) - Chaos is a Ladder: A New Theoretical Understanding of Contrastive
Learning via Augmentation Overlap [64.60460828425502]
We propose a new guarantee on the downstream performance of contrastive learning.
Our new theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations.
We propose an unsupervised model selection metric ARC that aligns well with downstream accuracy.
arXiv Detail & Related papers (2022-03-25T05:36:26Z) - Improve Agents without Retraining: Parallel Tree Search with Off-Policy
Correction [63.595545216327245]
We tackle two major challenges with Tree Search (TS)
We first discover and analyze a counter-intuitive phenomenon: action selection through TS and a pre-trained value function often leads to lower performance compared to the original pre-trained agent.
We introduce Batch-BFS: a GPU breadth-first search that advances all nodes in each depth of the tree simultaneously.
arXiv Detail & Related papers (2021-07-04T19:32:24Z) - RL-DARTS: Differentiable Architecture Search for Reinforcement Learning [62.95469460505922]
We introduce RL-DARTS, one of the first applications of Differentiable Architecture Search (DARTS) in reinforcement learning (RL)
By replacing the image encoder with a DARTS supernet, our search method is sample-efficient, requires minimal extra compute resources, and is also compatible with off-policy and on-policy RL algorithms, needing only minor changes in preexisting code.
We show that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies.
arXiv Detail & Related papers (2021-06-04T03:08:43Z) - The Value-Improvement Path: Towards Better Representations for
Reinforcement Learning [46.70945548475075]
We argue that the value prediction problems faced by an RL agent should not be addressed in isolation, but as a single, holistic, prediction problem.
An RL algorithm generates a sequence of policies that, at least approximately, improve towards the optimal policy.
We demonstrate that a representation that spans the past value-improvement path will also provide an accurate value approximation for future policy improvements.
arXiv Detail & Related papers (2020-06-03T12:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.