POMDPs in Continuous Time and Discrete Spaces
- URL: http://arxiv.org/abs/2010.01014v3
- Date: Mon, 26 Oct 2020 12:57:03 GMT
- Title: POMDPs in Continuous Time and Discrete Spaces
- Authors: Bastian Alt, Matthias Schultheis, Heinz Koeppl
- Abstract summary: We consider the problem of optimal decision making in such discrete state and action space systems under partial observability.
We give a mathematical description of a continuous-time partial observable Markov decision process (POMDP)
We present an approach solving the decision problem offline by learning an approximation of the value function and (ii) an online algorithm which provides a solution in belief space using deep reinforcement learning.
- Score: 28.463792234064805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many processes, such as discrete event systems in engineering or population
dynamics in biology, evolve in discrete space and continuous time. We consider
the problem of optimal decision making in such discrete state and action space
systems under partial observability. This places our work at the intersection
of optimal filtering and optimal control. At the current state of research, a
mathematical description for simultaneous decision making and filtering in
continuous time with finite state and action spaces is still missing. In this
paper, we give a mathematical description of a continuous-time partial
observable Markov decision process (POMDP). By leveraging optimal filtering
theory we derive a Hamilton-Jacobi-Bellman (HJB) type equation that
characterizes the optimal solution. Using techniques from deep learning we
approximately solve the resulting partial integro-differential equation. We
present (i) an approach solving the decision problem offline by learning an
approximation of the value function and (ii) an online algorithm which provides
a solution in belief space using deep reinforcement learning. We show the
applicability on a set of toy examples which pave the way for future methods
providing solutions for high dimensional problems.
Related papers
- Deep Learning Algorithms for Mean Field Optimal Stopping in Finite Space and Discrete Time [3.350071725971209]
This work studies the mean field optimal stopping (MFOS) problem, obtained as the number of agents approaches infinity.
We propose two deep learning methods: one simulates full trajectories to learn optimal decisions, whereas the other leverages DPP with backward induction.
We demonstrate the effectiveness of these approaches through numerical experiments on 6 different problems in spatial dimension up to 300.
arXiv Detail & Related papers (2024-10-11T14:27:17Z) - A Simulation-Free Deep Learning Approach to Stochastic Optimal Control [12.699529713351287]
We propose a simulation-free algorithm for the solution of generic problems in optimal control (SOC)
Unlike existing methods, our approach does not require the solution of an adjoint problem.
arXiv Detail & Related papers (2024-10-07T16:16:53Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Measurement Simplification in ρ-POMDP with Performance Guarantees [6.129902017281406]
Decision making under uncertainty is at the heart of any autonomous system acting with imperfect information.
This paper introduces a novel approach to efficient decision-making, by partitioning the high-dimensional observation space.
We show that the bounds are adaptive, computationally efficient, and that they converge to the original solution.
arXiv Detail & Related papers (2023-09-19T15:40:42Z) - Semi-supervised Learning of Partial Differential Operators and Dynamical
Flows [68.77595310155365]
We present a novel method that combines a hyper-network solver with a Fourier Neural Operator architecture.
We test our method on various time evolution PDEs, including nonlinear fluid flows in one, two, and three spatial dimensions.
The results show that the new method improves the learning accuracy at the time point of supervision point, and is able to interpolate and the solutions to any intermediate time.
arXiv Detail & Related papers (2022-07-28T19:59:14Z) - Deep Graphic FBSDEs for Opinion Dynamics Stochastic Control [27.38625075499457]
We present a scalable deep learning approach to solve opinion dynamics optimal control problems with mean field term coupling in the dynamics and cost function.
The proposed framework opens up the possibility for future applications on extremely large-scale problems.
arXiv Detail & Related papers (2022-04-05T22:07:32Z) - An application of the splitting-up method for the computation of a
neural network representation for the solution for the filtering equations [68.8204255655161]
Filtering equations play a central role in many real-life applications, including numerical weather prediction, finance and engineering.
One of the classical approaches to approximate the solution of the filtering equations is to use a PDE inspired method, called the splitting-up method.
We combine this method with a neural network representation to produce an approximation of the unnormalised conditional distribution of the signal process.
arXiv Detail & Related papers (2022-01-10T11:01:36Z) - Probabilistic Numerical Method of Lines for Time-Dependent Partial
Differential Equations [20.86460521113266]
Current state-of-the-art PDE solvers treat the space- and time-dimensions separately, serially, and with black-box algorithms.
We introduce a probabilistic version of a technique called method of lines to fix this issue.
Joint quantification of space- and time-uncertainty becomes possible without losing the performance benefits of well-tuned ODE solvers.
arXiv Detail & Related papers (2021-10-22T15:26:05Z) - Continuous-Time Fitted Value Iteration for Robust Policies [93.25997466553929]
Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics.
We propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI)
These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems.
arXiv Detail & Related papers (2021-10-05T11:33:37Z) - Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems.
Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z) - Learning with Differentiable Perturbed Optimizers [54.351317101356614]
We propose a systematic method to transform operations into operations that are differentiable and never locally constant.
Our approach relies on perturbeds, and can be used readily together with existing solvers.
We show how this framework can be connected to a family of losses developed in structured prediction, and give theoretical guarantees for their use in learning tasks.
arXiv Detail & Related papers (2020-02-20T11:11:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.