Deep reinforcement learning for optimal well control in subsurface
systems with uncertain geology
- URL: http://arxiv.org/abs/2203.13375v1
- Date: Thu, 24 Mar 2022 22:50:47 GMT
- Title: Deep reinforcement learning for optimal well control in subsurface
systems with uncertain geology
- Authors: Yusuf Nasir and Louis J. Durlofsky
- Abstract summary: A general control policy framework based on deep reinforcement learning (DRL) is introduced for closed-loop decision making in subsurface flow settings.
The DRL-based methodology is shown to result in an NPV increase of 15% (for the 2D cases) and 33% (3D cases) relative to robust optimization over prior models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A general control policy framework based on deep reinforcement learning (DRL)
is introduced for closed-loop decision making in subsurface flow settings.
Traditional closed-loop modeling workflows in this context involve the repeated
application of data assimilation/history matching and robust optimization
steps. Data assimilation can be particularly challenging in cases where both
the geological style (scenario) and individual model realizations are
uncertain. The closed-loop reservoir management (CLRM) problem is formulated
here as a partially observable Markov decision process, with the associated
optimization problem solved using a proximal policy optimization algorithm.
This provides a control policy that instantaneously maps flow data observed at
wells (as are available in practice) to optimal well pressure settings. The
policy is represented by a temporal convolution and gated transformer blocks.
Training is performed in a preprocessing step with an ensemble of prior
geological models, which can be drawn from multiple geological scenarios.
Example cases involving the production of oil via water injection, with both 2D
and 3D geological models, are presented. The DRL-based methodology is shown to
result in an NPV increase of 15% (for the 2D cases) and 33% (3D cases) relative
to robust optimization over prior models, and to an average improvement of 4%
in NPV relative to traditional CLRM. The solutions from the control policy are
found to be comparable to those from deterministic optimization, in which the
geological model is assumed to be known, even when multiple geological
scenarios are considered. The control policy approach results in a 76% decrease
in computational cost relative to traditional CLRM with the algorithms and
parameter settings considered in this work.
Related papers
- Recursive Gaussian Process State Space Model [4.572915072234487]
We propose a new online GPSSM method with adaptive capabilities for both operating domains and GP hyper parameters.
Online selection algorithm for inducing points is developed based on informative criteria to achieve lightweight learning.
Comprehensive evaluations on both synthetic and real-world datasets demonstrate the superior accuracy, computational efficiency, and adaptability of our method.
arXiv Detail & Related papers (2024-11-22T02:22:59Z) - Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy.
We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z) - Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains.
Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint.
This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions.
The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Model-Based Reparameterization Policy Gradient Methods: Theory and
Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics.
Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes.
We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z) - Neural ODEs as Feedback Policies for Nonlinear Optimal Control [1.8514606155611764]
We use Neural ordinary differential equations (Neural ODEs) to model continuous time dynamics as differential equations parametrized with neural networks.
We propose the use of a neural control policy posed as a Neural ODE to solve general nonlinear optimal control problems.
arXiv Detail & Related papers (2022-10-20T13:19:26Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Stochastic optimal well control in subsurface reservoirs using
reinforcement learning [0.0]
We present a case study of model-free reinforcement learning framework to solve optimal control for a predefined parameter uncertainty distribution.
In principle, RL algorithms are capable of learning optimal action policies to maximize a numerical reward signal.
We present numerical results using two state-of-the-art RL algorithms, proximal policy optimization (PPO) and advantage actor-critic (A2C) on two subsurface flow test cases.
arXiv Detail & Related papers (2022-07-07T17:34:23Z) - Queueing Network Controls via Deep Reinforcement Learning [0.0]
We develop a Proximal policy optimization algorithm for queueing networks.
The algorithm consistently generates control policies that outperform state-of-arts in literature.
A key to the successes of our PPO algorithm is the use of three variance reduction techniques in estimating the relative value function.
arXiv Detail & Related papers (2020-07-31T01:02:57Z) - Single-step deep reinforcement learning for open-loop control of laminar
and turbulent flows [0.0]
This research gauges the ability of deep reinforcement learning (DRL) techniques to assist the optimization and control of fluid mechanical systems.
It combines a novel, "degenerate" version of the prototypical policy optimization (PPO) algorithm, that trains a neural network in optimizing the system only once per learning episode.
arXiv Detail & Related papers (2020-06-04T16:11:26Z) - Localized active learning of Gaussian process state space models [63.97366815968177]
A globally accurate model is not required to achieve good performance in many common control applications.
We propose an active learning strategy for Gaussian process state space models that aims to obtain an accurate model on a bounded subset of the state-action space.
By employing model predictive control, the proposed technique integrates information collected during exploration and adaptively improves its exploration strategy.
arXiv Detail & Related papers (2020-05-04T05:35:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.