Related papers: Distributional Offline Continuous-Time Reinforcement Learning with Neural Physics-Informed PDEs (SciPhy RL for DOCTR-L)

Distributional Offline Continuous-Time Reinforcement Learning with Neural Physics-Informed PDEs (SciPhy RL for DOCTR-L)

URL: http://arxiv.org/abs/2104.01040v1
Date: Fri, 2 Apr 2021 13:22:14 GMT
Title: Distributional Offline Continuous-Time Reinforcement Learning with Neural Physics-Informed PDEs (SciPhy RL for DOCTR-L)
Authors: Igor Halperin
Abstract summary: This paper addresses distributional offline continuous-time reinforcement learning (DOCTR-L) with policies for high-dimensional optimal control. A data-driven solution of the soft HJB equation uses methods of Neural PDEs and Physics-Informed Neural Networks developed in the field of Scientific Machine Learning (SciML) Our algorithm called Deep DOCTR-L converts offline high-dimensional data into an optimal policy in one step by reducing it to supervised learning.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This paper addresses distributional offline continuous-time reinforcement learning (DOCTR-L) with stochastic policies for high-dimensional optimal control. A soft distributional version of the classical Hamilton-Jacobi-Bellman (HJB) equation is given by a semilinear partial differential equation (PDE). This `soft HJB equation' can be learned from offline data without assuming that the latter correspond to a previous optimal or near-optimal policy. A data-driven solution of the soft HJB equation uses methods of Neural PDEs and Physics-Informed Neural Networks developed in the field of Scientific Machine Learning (SciML). The suggested approach, dubbed `SciPhy RL', thus reduces DOCTR-L to solving neural PDEs from data. Our algorithm called Deep DOCTR-L converts offline high-dimensional data into an optimal policy in one step by reducing it to supervised learning, instead of relying on value iteration or policy iteration methods. The method enables a computable approach to the quality control of obtained policies in terms of both their expected returns and uncertainties about their values.

Related papers

Partial-differential-algebraic equations of nonlinear dynamics by Physics-Informed Neural-Network: (I) Operator splitting and framework assessment [51.3422222472898]
Several forms for constructing novel physics-informed-networks (PINN) for the solution of partial-differential-algebraic equations are proposed. Among these novel methods are the PDE forms, which evolve from the lower-level form with fewer unknown dependent variables to higher-level form with more dependent variables.
arXiv Detail & Related papers (2024-07-13T22:48:17Z)
Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains. Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint. This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions. The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z)
Parametric PDE Control with Deep Reinforcement Learning and Differentiable L0-Sparse Polynomial Policies [0.5919433278490629]
Optimal control of parametric partial differential equations (PDEs) is crucial in many applications in engineering and science. Deep reinforcement learning (DRL) has the potential to solve high-dimensional and complex control problems. In this work, we leverage dictionary learning and differentiable L$_0$ regularization to learn sparse, robust, and interpretable control policies for PDEs.
arXiv Detail & Related papers (2024-03-22T15:06:31Z)
Training Deep Surrogate Models with Large Scale Online Learning [48.7576911714538]
Deep learning algorithms have emerged as a viable alternative for obtaining fast solutions for PDEs. Models are usually trained on synthetic data generated by solvers, stored on disk and read back for training. It proposes an open source online training framework for deep surrogate models.
arXiv Detail & Related papers (2023-06-28T12:02:27Z)
Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems. PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features. In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z)
Bi-level Physics-Informed Neural Networks for PDE Constrained Optimization using Broyden's Hypergradients [29.487375792661005]
We present a novel bi-level optimization framework to solve PDE constrained optimization problems. For the inner loop optimization, we adopt PINNs to solve the PDE constraints only. For the outer loop, we design a novel method by using Broyden'simat method based on the Implicit Function Theorem.
arXiv Detail & Related papers (2022-09-15T06:21:24Z)
Data-driven initialization of deep learning solvers for Hamilton-Jacobi-Bellman PDEs [3.249853429482705]
A state-dependent Riccati equation control law is first used to generate a gradient-augmented synthetic dataset for supervised learning. The resulting model becomes a warm start for the minimization of a loss function based on the residual of the HJB PDE.
arXiv Detail & Related papers (2022-07-19T14:34:07Z)
Physics-Informed Neural Operator for Learning Partial Differential Equations [55.406540167010014]
PINO is the first hybrid approach incorporating data and PDE constraints at different resolutions to learn the operator. The resulting PINO model can accurately approximate the ground-truth solution operator for many popular PDE families.
arXiv Detail & Related papers (2021-11-06T03:41:34Z)
Machine Learning For Elliptic PDEs: Fast Rate Generalization Bound, Neural Scaling Law and Minimax Optimality [11.508011337440646]
We study the statistical limits of deep learning techniques for solving elliptic partial differential equations (PDEs) from random samples. To simplify the problem, we focus on a prototype elliptic PDE: the Schr"odinger equation on a hypercube with zero Dirichlet boundary condition. We establish upper and lower bounds for both methods, which improves upon concurrently developed upper bounds for this problem.
arXiv Detail & Related papers (2021-10-13T17:26:31Z)
PhyCRNet: Physics-informed Convolutional-Recurrent Network for Solving Spatiotemporal PDEs [8.220908558735884]
Partial differential equations (PDEs) play a fundamental role in modeling and simulating problems across a wide range of disciplines. Recent advances in deep learning have shown the great potential of physics-informed neural networks (NNs) to solve PDEs as a basis for data-driven inverse analysis. We propose the novel physics-informed convolutional-recurrent learning architectures (PhyCRNet and PhCRyNet-s) for solving PDEs without any labeled data.
arXiv Detail & Related papers (2021-06-26T22:22:19Z)
OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation [59.469401906712555]
We present an offline reinforcement learning algorithm that prevents overestimation in a more principled way. Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy. We show that OptiDICE performs competitively with the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-21T00:43:30Z)
MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. We show that an existing model-based RL algorithm already produces significant gains in the offline setting. We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.