Related papers: Stochastic optimal well control in subsurface reservoirs using reinforcement learning

Stochastic optimal well control in subsurface reservoirs using reinforcement learning

URL: http://arxiv.org/abs/2207.03456v2
Date: Fri, 8 Jul 2022 20:54:09 GMT
Title: Stochastic optimal well control in subsurface reservoirs using reinforcement learning
Authors: Atish Dixit, Ahmed H. ElSheikh
Abstract summary: We present a case study of model-free reinforcement learning framework to solve optimal control for a predefined parameter uncertainty distribution. In principle, RL algorithms are capable of learning optimal action policies to maximize a numerical reward signal. We present numerical results using two state-of-the-art RL algorithms, proximal policy optimization (PPO) and advantage actor-critic (A2C) on two subsurface flow test cases.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a case study of model-free reinforcement learning (RL) framework to solve stochastic optimal control for a predefined parameter uncertainty distribution and partially observable system. We focus on robust optimal well control problem which is a subject of intensive research activities in the field of subsurface reservoir management. For this problem, the system is partially observed since the data is only available at well locations. Furthermore, the model parameters are highly uncertain due to sparsity of available field data. In principle, RL algorithms are capable of learning optimal action policies -- a map from states to actions -- to maximize a numerical reward signal. In deep RL, this mapping from state to action is parameterized using a deep neural network. In the RL formulation of the robust optimal well control problem, the states are represented by saturation and pressure values at well locations while the actions represent the valve openings controlling the flow through wells. The numerical reward refers to the total sweep efficiency and the uncertain model parameter is the subsurface permeability field. The model parameter uncertainties are handled by introducing a domain randomisation scheme that exploits cluster analysis on its uncertainty distribution. We present numerical results using two state-of-the-art RL algorithms, proximal policy optimization (PPO) and advantage actor-critic (A2C), on two subsurface flow test cases representing two distinct uncertainty distributions of permeability field. The results were benchmarked against optimisation results obtained using differential evolution algorithm. Furthermore, we demonstrate the robustness of the proposed use of RL by evaluating the learned control policy on unseen samples drawn from the parameter uncertainty distribution that were not used during the training process.

Related papers

Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy. We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z)
Distributionally Robust Optimization as a Scalable Framework to Characterize Extreme Value Distributions [22.765095010254118]
The goal of this paper is to develop distributionally robust optimization (DRO) estimators, specifically for multidimensional Extreme Value Theory (EVT) statistics. In order to mitigate over-conservative estimates while enhancing out-of-sample performance, we study DRO estimators informed by semi-parametric max-stable constraints in the space of point processes. Both approaches are validated using synthetically generated data, recovering prescribed characteristics, and verifying the efficacy of the proposed techniques.
arXiv Detail & Related papers (2024-07-31T19:45:27Z)
Sublinear Regret for a Class of Continuous-Time Linear--Quadratic Reinforcement Learning Problems [10.404992912881601]
We study reinforcement learning for a class of continuous-time linear-quadratic (LQ) control problems for diffusions. We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an actor-critic algorithm to learn the optimal policy parameter directly.
arXiv Detail & Related papers (2024-07-24T12:26:21Z)
Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains. Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint. This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions. The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z)
Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. In common practice, convergence (hyper)policies are learned only to deploy their deterministic version. We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z)
Deep reinforcement learning for optimal well control in subsurface systems with uncertain geology [0.0]
A general control policy framework based on deep reinforcement learning (DRL) is introduced for closed-loop decision making in subsurface flow settings. The DRL-based methodology is shown to result in an NPV increase of 15% (for the 2D cases) and 33% (3D cases) relative to robust optimization over prior models.
arXiv Detail & Related papers (2022-03-24T22:50:47Z)
Sample Efficient Deep Reinforcement Learning via Uncertainty Estimation [12.415463205960156]
In model-free deep reinforcement learning (RL) algorithms, using noisy value estimates to supervise policy evaluation and optimization is detrimental to the sample efficiency. We provide a systematic analysis of the sources of uncertainty in the noisy supervision that occurs in RL. We propose a method whereby two complementary uncertainty estimation methods account for both the Q-value and the environmentity to better mitigate the negative impacts of noisy supervision.
arXiv Detail & Related papers (2022-01-05T15:46:06Z)
Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well. We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain. We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z)
Gaussian Process-based Min-norm Stabilizing Controller for Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem. We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z)
Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs) Semi-implicit actor (SIA) powered by a flexible policy distribution. We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z)
Adaptive Control and Regret Minimization in Linear Quadratic Gaussian (LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty. LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)
Mixed Reinforcement Learning with Additive Stochastic Uncertainty [19.229447330293546]
Reinforcement learning (RL) methods often rely on massive exploration data to search optimal policies, and suffer from poor sampling efficiency. This paper presents a mixed RL algorithm by simultaneously using dual representations of environmental dynamics to search the optimal policy. The effectiveness of the mixed RL is demonstrated by a typical optimal control problem of non-affine nonlinear systems.
arXiv Detail & Related papers (2020-02-28T08:02:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.