Making Sense of Reinforcement Learning and Probabilistic Inference
- URL: http://arxiv.org/abs/2001.00805v3
- Date: Wed, 4 Nov 2020 18:12:05 GMT
- Title: Making Sense of Reinforcement Learning and Probabilistic Inference
- Authors: Brendan O'Donoghue, Ian Osband, Catalin Ionescu
- Abstract summary: Reinforcement learning (RL) combines a control problem with statistical estimation.
We show that the popular RL as inference' approximation can perform poorly in even very basic problems.
We show that with a small modification the framework does yield algorithms that can provably perform well.
- Score: 15.987913388420667
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) combines a control problem with statistical
estimation: The system dynamics are not known to the agent, but can be learned
through experience. A recent line of research casts `RL as inference' and
suggests a particular framework to generalize the RL problem as probabilistic
inference. Our paper surfaces a key shortcoming in that approach, and clarifies
the sense in which RL can be coherently cast as an inference problem. In
particular, an RL agent must consider the effects of its actions upon future
rewards and observations: The exploration-exploitation tradeoff. In all but the
most simple settings, the resulting inference is computationally intractable so
that practical RL algorithms must resort to approximation. We demonstrate that
the popular `RL as inference' approximation can perform poorly in even very
basic problems. However, we show that with a small modification the framework
does yield algorithms that can provably perform well, and we show that the
resulting algorithm is equivalent to the recently proposed K-learning, which we
further connect with Thompson sampling.
Related papers
- Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach [2.3020018305241337]
This paper is the first to propose considering the RRL problems within the positional differential game theory.
Namely, we prove that under Isaacs's condition, the same Q-function can be utilized as an approximate solution of both minimax and maximin Bellman equations.
We present the Isaacs Deep Q-Network algorithms and demonstrate their superiority compared to other baseline RRL and Multi-Agent RL algorithms in various environments.
arXiv Detail & Related papers (2024-05-03T12:21:43Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - One-Step Distributional Reinforcement Learning [10.64435582017292]
We present the simpler one-step distributional reinforcement learning (OS-DistrRL) framework.
We show that our approach comes with a unified theory for both policy evaluation and control.
We propose two OS-DistrRL algorithms for which we provide an almost sure convergence analysis.
arXiv Detail & Related papers (2023-04-27T06:57:00Z) - RACCER: Towards Reachable and Certain Counterfactual Explanations for
Reinforcement Learning [2.0341936392563063]
We propose RACCER, the first-specific approach to generating counterfactual explanations for the behavior of RL agents.
We use a tree search to find the most suitable counterfactuals based on the defined properties.
We evaluate RACCER in two tasks as well as conduct a user study to show that RL-specific counterfactuals help users better understand agents' behavior.
arXiv Detail & Related papers (2023-03-08T09:47:00Z) - Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning [92.18524491615548]
Contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL)
We study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions.
Under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.
arXiv Detail & Related papers (2022-07-29T17:29:08Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Exploratory State Representation Learning [63.942632088208505]
We propose a new approach called XSRL (eXploratory State Representation Learning) to solve the problems of exploration and SRL in parallel.
On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations.
On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the objective of a discovery policy.
arXiv Detail & Related papers (2021-09-28T10:11:07Z) - Combining Pessimism with Optimism for Robust and Efficient Model-Based
Deep Reinforcement Learning [56.17667147101263]
In real-world tasks, reinforcement learning agents encounter situations that are not present during training time.
To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations.
We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
arXiv Detail & Related papers (2021-03-18T16:50:17Z) - Reinforcement Learning with Algorithms from Probabilistic Structure
Estimation [9.37335587960084]
Reinforcement learning algorithms aim to learn optimal decisions in unknown environments.
It is unknown from the outset whether or not the agent's actions will impact the environment.
It is often not possible to determine which RL algorithm is most fitting.
arXiv Detail & Related papers (2021-03-15T09:51:34Z) - DisCor: Corrective Feedback in Reinforcement Learning via Distribution
Correction [96.90215318875859]
We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from corrective feedback.
We propose a new algorithm, DisCor, which computes an approximation to this optimal distribution and uses it to re-weight the transitions used for training.
arXiv Detail & Related papers (2020-03-16T16:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.