CAR-DESPOT: Causally-Informed Online POMDP Planning for Robots in
Confounded Environments
- URL: http://arxiv.org/abs/2304.06848v3
- Date: Thu, 27 Jul 2023 07:00:27 GMT
- Title: CAR-DESPOT: Causally-Informed Online POMDP Planning for Robots in
Confounded Environments
- Authors: Ricardo Cannizzaro, Lars Kunze
- Abstract summary: A major challenge for making accurate and robust action predictions is the problem of confounding.
The partially observable Markov decision process (POMDP) is a widely-used framework to model these and partially-observable decision-making problems.
This paper presents a novel causally-informed extension of "anytime regularized determinized sparse partially observable tree" (AR-DESPOT) to eliminate errors caused by unmeasured confounder variables.
- Score: 5.979296454783688
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robots operating in real-world environments must reason about possible
outcomes of stochastic actions and make decisions based on partial observations
of the true world state. A major challenge for making accurate and robust
action predictions is the problem of confounding, which if left untreated can
lead to prediction errors. The partially observable Markov decision process
(POMDP) is a widely-used framework to model these stochastic and
partially-observable decision-making problems. However, due to a lack of
explicit causal semantics, POMDP planning methods are prone to confounding bias
and thus in the presence of unobserved confounders may produce underperforming
policies. This paper presents a novel causally-informed extension of "anytime
regularized determinized sparse partially observable tree" (AR-DESPOT), a
modern anytime online POMDP planner, using causal modelling and inference to
eliminate errors caused by unmeasured confounder variables. We further propose
a method to learn offline the partial parameterisation of the causal model for
planning, from ground truth model data. We evaluate our methods on a toy
problem with an unobserved confounder and show that the learned causal model is
highly accurate, while our planning method is more robust to confounding and
produces overall higher performing policies than AR-DESPOT.
Related papers
- Learning Logic Specifications for Policy Guidance in POMDPs: an
Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver.
We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications.
We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z) - COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically
for Model-Based RL [50.385005413810084]
Dyna-style model-based reinforcement learning contains two phases: model rollouts to generate sample for policy learning and real environment exploration.
$textttCOPlanner$ is a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem.
arXiv Detail & Related papers (2023-10-11T06:10:07Z) - Fine-Tuning Language Models with Advantage-Induced Policy Alignment [80.96507425217472]
We propose a novel algorithm for aligning large language models to human preferences.
We show that it consistently outperforms PPO in language tasks by a large margin.
We also provide a theoretical justification supporting the design of our loss function.
arXiv Detail & Related papers (2023-06-04T01:59:40Z) - Predictable MDP Abstraction for Unsupervised Model-Based RL [93.91375268580806]
We propose predictable MDP abstraction (PMA)
Instead of training a predictive model on the original MDP, we train a model on a transformed MDP with a learned action space.
We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised model-based RL approaches.
arXiv Detail & Related papers (2023-02-08T07:37:51Z) - Plan To Predict: Learning an Uncertainty-Foreseeing Model for
Model-Based Reinforcement Learning [32.24146877835396]
We propose emphPlan To Predict (P2P), a framework that treats the model rollout process as a sequential decision making problem.
We show that P2P achieves state-of-the-art performance on several challenging benchmark tasks.
arXiv Detail & Related papers (2023-01-20T10:17:22Z) - Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z) - Revisiting Design Choices in Model-Based Offline Reinforcement Learning [39.01805509055988]
Offline reinforcement learning enables agents to leverage large pre-collected datasets of environment transitions to learn control policies.
This paper compares and designs novel protocols to investigate their interaction with other hyper parameters, such as the number of models, or imaginary rollout horizon.
arXiv Detail & Related papers (2021-10-08T13:51:34Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Bootstrapped model learning and error correction for planning with
uncertainty in model-based RL [1.370633147306388]
A natural aim is to learn a model that reflects accurately the dynamics of the environment.
This paper explores the problem of model misspecification through uncertainty-aware reinforcement learning agents.
We propose a bootstrapped multi-headed neural network that learns the distribution of future states and rewards.
arXiv Detail & Related papers (2020-04-15T15:41:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.