Related papers: BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs

BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs

URL: http://arxiv.org/abs/2202.08884v1
Date: Thu, 17 Feb 2022 19:48:35 GMT
Title: BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs
Authors: Sammie Katt, Hai Nguyen, Frans A. Oliehoek, Christopher Amato
Abstract summary: We present a representation-agnostic formulation of BRL under partially observability, unifying the previous models under one theoretical umbrella. We also propose a novel derivation, Bayes-Adaptive Deep Dropout rl (BADDr), based on dropout networks.
Score: 22.78390558602203
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While reinforcement learning (RL) has made great advances in scalability, exploration and partial observability are still active research topics. In contrast, Bayesian RL (BRL) provides a principled answer to both state estimation and the exploration-exploitation trade-off, but struggles to scale. To tackle this challenge, BRL frameworks with various prior assumptions have been proposed, with varied success. This work presents a representation-agnostic formulation of BRL under partially observability, unifying the previous models under one theoretical umbrella. To demonstrate its practical significance we also propose a novel derivation, Bayes-Adaptive Deep Dropout rl (BADDr), based on dropout networks. Under this parameterization, in contrast to previous work, the belief over the state and dynamics is a more scalable inference problem. We choose actions through Monte-Carlo tree search and empirically show that our method is competitive with state-of-the-art BRL methods on small domains while being able to solve much larger ones.

Related papers

Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining [74.83412846804977]
Reinforcement learning (RL)-based fine-tuning has become a crucial step in post-training language models. We present a systematic end-to-end study of RL fine-tuning for mathematical reasoning by training models entirely from scratch.
arXiv Detail & Related papers (2025-04-10T17:15:53Z)
ToM-RL: Reinforcement Learning Unlocks Theory of Mind in Small LLMs [14.29992535286614]
We show that rule-based reinforcement learning can unlock Theory of Mind (ToM) reasoning capabilities even in small-scale language models. Our RL-trained 7B model achieves 84.50% accuracy on the Hi-ToM benchmark, surpassing models like GPT-4o and DeepSeek-v3. These findings highlight RL's potential to enhance social cognitive reasoning, bridging the gap between structured problem-solving and nuanced social inference.
arXiv Detail & Related papers (2025-04-02T12:58:42Z)
Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning [69.19840497497503]
It is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents. We propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents. We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment.
arXiv Detail & Related papers (2023-09-04T09:09:54Z)
Bag of Policies for Distributional Deep Exploration [7.522221438479138]
Bag of Policies (BoP) is built on top of any return distribution estimator by maintaining a population of its copies. During training, each episode is controlled by only one of the heads and the collected state-action pairs are used to update all heads off-policy. BoP results in greater robustness and speed during learning as demonstrated by our experimental results on ALE Atari games.
arXiv Detail & Related papers (2023-08-03T13:43:03Z)
ContraBAR: Contrastive Bayes-Adaptive Deep RL [22.649531458557206]
In meta reinforcement learning (meta RL), an agent seeks a Bayes-optimal policy -- the optimal policy when facing an unknown task. We investigate whether contrastive methods can be used for learning Bayes-optimal behavior. We propose a simple meta RL algorithm that uses contrastive predictive coding (CPC) in lieu of variational belief inference.
arXiv Detail & Related papers (2023-06-04T17:50:20Z)
Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning [93.99377042564919]
This paper tries to build more flexible constraints for value estimation without impeding the exploration of potential advantages. The key idea is to leverage off-the-shelf RL simulators, which can be easily interacted with in an online manner, as the "test bed" for offline policies. We introduce CoWorld, a model-based RL approach that mitigates cross-domain discrepancies in state and reward spaces.
arXiv Detail & Related papers (2023-05-24T15:45:35Z)
A Survey on Model-based Reinforcement Learning [21.85904195671014]
Reinforcement learning (RL) solves sequential decision-making problems via a trial-and-error process interacting with the environment. Model-based reinforcement learning (MBRL) is believed to be a promising direction, which builds environment models in which the trial-and-errors can take place without real costs.
arXiv Detail & Related papers (2022-06-19T05:28:03Z)
Generalization in Deep RL for TSP Problems via Equivariance and Local Search [21.07325126324399]
We propose a simple deep learning architecture that learns with novel RL training techniques. We empirically evaluate our proposition on random and realistic TSP problems against relevant state-of-the-art deep RL methods.
arXiv Detail & Related papers (2021-10-07T16:20:37Z)
Bayesian Bellman Operators [55.959376449737405]
We introduce a novel perspective on Bayesian reinforcement learning (RL) Our framework is motivated by the insight that when bootstrapping is introduced, model-free approaches actually infer a posterior over Bellman operators, not value functions.
arXiv Detail & Related papers (2021-06-09T12:20:46Z)
Principled Exploration via Optimistic Bootstrapping and Backward Induction [84.78836146128238]
We propose a principled exploration method for Deep Reinforcement Learning (DRL) through Optimistic Bootstrapping and Backward Induction (OB2I) OB2I constructs a general-purpose UCB-bonus through non-parametric bootstrap in DRL. We build theoretical connections between the proposed UCB-bonus and the LSVI-UCB in a linear setting.
arXiv Detail & Related papers (2021-05-13T01:15:44Z)
Instabilities of Offline RL with Pre-Trained Neural Representation [127.89397629569808]
In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn) policies in scenarios where the data are collected from a distribution that substantially differs from that of the target policy to be evaluated. Recent theoretical advances have shown that such sample-efficient offline RL is indeed possible provided certain strong representational conditions hold. This work studies these issues from an empirical perspective to gauge how stable offline RL methods are.
arXiv Detail & Related papers (2021-03-08T18:06:44Z)
Provably Good Batch Reinforcement Learning Without Great Exploration [51.51462608429621]
Batch reinforcement learning (RL) is important to apply RL algorithms to many high stakes tasks. Recent algorithms have shown promise but can still be overly optimistic in their expected outcomes. We show that a small modification to Bellman optimality and evaluation back-up to take a more conservative update can have much stronger guarantees.
arXiv Detail & Related papers (2020-07-16T09:25:54Z)
Making Sense of Reinforcement Learning and Probabilistic Inference [15.987913388420667]
Reinforcement learning (RL) combines a control problem with statistical estimation. We show that the popular RL as inference' approximation can perform poorly in even very basic problems. We show that with a small modification the framework does yield algorithms that can provably perform well.
arXiv Detail & Related papers (2020-01-03T12:50:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.