A multilevel reinforcement learning framework for PDE based control
- URL: http://arxiv.org/abs/2210.08400v1
- Date: Sat, 15 Oct 2022 23:52:48 GMT
- Title: A multilevel reinforcement learning framework for PDE based control
- Authors: Atish Dixit, Ahmed Elsheikh
- Abstract summary: Reinforcement learning (RL) is a promising method to solve control problems.
Model-free RL algorithms are sample inefficient and require thousands if not millions of samples to learn optimal control policies.
We propose a multilevel RL framework in order to ease this cost by exploiting sublevel models that correspond to coarser scale discretization.
- Score: 0.2538209532048867
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Reinforcement learning (RL) is a promising method to solve control problems.
However, model-free RL algorithms are sample inefficient and require thousands
if not millions of samples to learn optimal control policies. A major source of
computational cost in RL corresponds to the transition function, which is
dictated by the model dynamics. This is especially problematic when model
dynamics is represented with coupled PDEs. In such cases, the transition
function often involves solving a large-scale discretization of the said PDEs.
We propose a multilevel RL framework in order to ease this cost by exploiting
sublevel models that correspond to coarser scale discretization (i.e.
multilevel models). This is done by formulating an approximate multilevel Monte
Carlo estimate of the objective function of the policy and / or value network
instead of Monte Carlo estimates, as done in the classical framework. As a
demonstration of this framework, we present a multilevel version of the
proximal policy optimization (PPO) algorithm. Here, the level refers to the
grid fidelity of the chosen simulation-based environment. We provide two
examples of simulation-based environments that employ stochastic PDEs that are
solved using finite-volume discretization. For the case studies presented, we
observed substantial computational savings using multilevel PPO compared to its
classical counterpart.
Related papers
- Stochastic Q-learning for Large Discrete Action Spaces [79.1700188160944]
In complex environments with discrete action spaces, effective decision-making is critical in reinforcement learning (RL)
We present value-based RL approaches which, as opposed to optimizing over the entire set of $n$ actions, only consider a variable set of actions, possibly as small as $mathcalO(log(n)$)$.
The presented value-based RL methods include, among others, Q-learning, StochDQN, StochDDQN, all of which integrate this approach for both value-function updates and action selection.
arXiv Detail & Related papers (2024-05-16T17:58:44Z) - Kolmogorov n-Widths for Multitask Physics-Informed Machine Learning (PIML) Methods: Towards Robust Metrics [8.90237460752114]
This topic encompasses a broad array of methods and models aimed at solving a single or a collection of PDE problems, called multitask learning.
PIML is characterized by the incorporation of physical laws into the training process of machine learning models in lieu of large data when solving PDE problems.
arXiv Detail & Related papers (2024-02-16T23:21:40Z) - Sample Complexity Characterization for Linear Contextual MDPs [67.79455646673762]
Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable.
CMDPs serve as an important framework to model many real-world applications with time-varying environments.
We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
arXiv Detail & Related papers (2024-02-05T03:25:04Z) - Model-Based Reparameterization Policy Gradient Methods: Theory and
Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics.
Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes.
We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z) - PAC Reinforcement Learning for Predictive State Representations [60.00237613646686]
We study online Reinforcement Learning (RL) in partially observable dynamical systems.
We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models.
We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scalingly.
arXiv Detail & Related papers (2022-07-12T17:57:17Z) - Robust optimal well control using an adaptive multi-grid reinforcement
learning framework [0.0]
Reinforcement learning is a promising tool to solve robust optimal well control problems.
The proposed framework is demonstrated using a state-of-the-art, model-free policy-based RL algorithm.
Prominent gains in the computational efficiency is observed using the proposed framework saving around 60-70% of computational cost of its single fine-grid counterpart.
arXiv Detail & Related papers (2022-07-07T12:08:57Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Variational Model-based Policy Optimization [34.80171122943031]
Model-based reinforcement learning (RL) algorithms allow us to combine model-generated data with those collected from interaction with the real system in order to alleviate the data efficiency problem in RL.
We propose an objective function as a variational lower-bound of a log-likelihood of a log-likelihood to jointly learn and improve model and policy.
Our experiments on a number of continuous control tasks show that despite being more complex, our model-based (E-step) algorithm, called emactoral model-based policy optimization (VMBPO), is more sample-efficient and
arXiv Detail & Related papers (2020-06-09T18:30:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.