Online Planning for Constrained POMDPs with Continuous Spaces through
Dual Ascent
- URL: http://arxiv.org/abs/2212.12154v1
- Date: Fri, 23 Dec 2022 05:22:39 GMT
- Title: Online Planning for Constrained POMDPs with Continuous Spaces through
Dual Ascent
- Authors: Arec Jamgochian, Anthony Corso, Mykel J. Kochenderfer
- Abstract summary: We propose algorithms for online CPOMDP planning for continuous state, action, and observation spaces.
We empirically compare the effectiveness of our proposed algorithms on continuous CPOMDPs that model both toy and real-world safety-critical problems.
- Score: 37.61747231296097
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rather than augmenting rewards with penalties for undesired behavior,
Constrained Partially Observable Markov Decision Processes (CPOMDPs) plan
safely by imposing inviolable hard constraint value budgets. Previous work
performing online planning for CPOMDPs has only been applied to discrete action
and observation spaces. In this work, we propose algorithms for online CPOMDP
planning for continuous state, action, and observation spaces by combining dual
ascent with progressive widening. We empirically compare the effectiveness of
our proposed algorithms on continuous CPOMDPs that model both toy and
real-world safety-critical problems. Additionally, we compare against the use
of online solvers for continuous unconstrained POMDPs that scalarize cost
constraints into rewards, and investigate the effect of optimistic cost
propagation.
Related papers
- Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs [82.34567890576423]
We develop a deterministic policy gradient primal-dual method to find an optimal deterministic policy with non-asymptotic convergence.
We prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair.
To the best of our knowledge, this appears to be the first work that proposes a deterministic policy search method for continuous-space constrained MDPs.
arXiv Detail & Related papers (2024-08-19T14:11:04Z) - Recursively-Constrained Partially Observable Markov Decision Processes [13.8724466775267]
We show that C-POMDPs violate the optimal substructure property over successive decision steps.
Online re-planning in C-POMDPs is often ineffective due to the inconsistency resulting from this violation.
We introduce the Recursively-Constrained POMDP, which imposes additional history-dependent cost constraints on the C-POMDP.
arXiv Detail & Related papers (2023-10-15T00:25:07Z) - Theoretically Guaranteed Policy Improvement Distilled from Model-Based
Planning [64.10794426777493]
Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks.
Recent practices tend to distill optimized action sequences into an RL policy during the training phase.
We develop an approach to distill from model-based planning to the policy.
arXiv Detail & Related papers (2023-07-24T16:52:31Z) - Provably Efficient Model-Free Algorithms for Non-stationary CMDPs [10.930095238723327]
We study model-free reinforcement learning algorithms in episodic non-stationary constrained Markov Decision Processes.
In the non-stationary environment, reward, utility functions, and transition kernels can vary arbitrarily over time as long as the cumulative variations do not exceed certain variation budgets.
We propose the first model-free, simulator-free RL algorithms with sublinear regret and zero constraint violation for non-stationary CMDPs.
arXiv Detail & Related papers (2023-03-10T06:33:38Z) - Dynamic Regret of Online Markov Decision Processes [84.20723936192945]
We investigate online Markov Decision Processes (MDPs) with adversarially changing loss functions and known transitions.
We choose dynamic regret as the performance measure, defined as the performance difference between the learner and any sequence of feasible changing policies.
We consider three foundational models of online MDPs, including episodic loop-free Shortest Path (SSP), episodic SSP, and infinite-horizon MDPs.
arXiv Detail & Related papers (2022-08-26T07:42:53Z) - Risk-Averse Decision Making Under Uncertainty [18.467950783426947]
A large class of decision making under uncertainty problems can be described via Markov decision processes (MDPs) or partially observable MDPs (POMDPs)
In this paper, we consider the problem of designing policies for MDPs and POMDPs with objectives and constraints in terms of dynamic coherent risk measures.
arXiv Detail & Related papers (2021-09-09T07:52:35Z) - Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks [59.419152768018506]
We show that any optimal policy necessarily satisfies the k-SP constraint.
We propose a novel cost function that penalizes the policy violating SP constraint, instead of completely excluding it.
Our experiments on MiniGrid, DeepMind Lab, Atari, and Fetch show that the proposed method significantly improves proximal policy optimization (PPO)
arXiv Detail & Related papers (2021-07-13T21:39:21Z) - Efficient Sampling in POMDPs with Lipschitz Bandits for Motion Planning
in Continuous Spaces [5.732271870257913]
Decision making under uncertainty can be framed as a partially observable Markov decision process (POMDP)
Finding exact solutions of POMDPs is generally intractable, but the solution can be approximated by sampling-based approaches.
We demonstrate the effectiveness of this approach in the context of motion planning for automated driving.
arXiv Detail & Related papers (2021-06-08T09:31:48Z) - An On-Line POMDP Solver for Continuous Observation Spaces [5.482532589225552]
We propose a new on-line POMDP solver, called Lazy Belief Extraction for Continuous POMDPs (LABECOP)
It combines methods from Monte-Carlo-Tree-Search and particle filtering to construct a policy reprentation which doesn't require discretised observation spaces.
Experiments on three different problems involving continuous observation spaces indicate that LABECOP performs similar or better than state-of-the-art POMDP solvers.
arXiv Detail & Related papers (2020-11-04T00:16:08Z) - Exploiting Submodular Value Functions For Scaling Up Active Perception [60.81276437097671]
In active perception tasks, agent aims to select sensory actions that reduce uncertainty about one or more hidden variables.
Partially observable Markov decision processes (POMDPs) provide a natural model for such problems.
As the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially.
arXiv Detail & Related papers (2020-09-21T09:11:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.