Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty
- URL: http://arxiv.org/abs/2405.14973v1
- Date: Thu, 23 May 2024 18:19:47 GMT
- Title: Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty
- Authors: Andrew Rosemberg, Alexandre Street, Davi M. Valladão, Pascal Van Hentenryck,
- Abstract summary: Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains.
Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint.
This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions.
The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
- Score: 55.06411438416805
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains. Some SDMU applications are naturally modeled as Multistage Stochastic Optimization Problems (MSPs), but the resulting optimizations are notoriously challenging from a computational standpoint. Under assumptions of convexity and stage-wise independence of the uncertainty, the resulting optimization can be solved efficiently using Stochastic Dual Dynamic Programming (SDDP). Two-stage Linear Decision Rules (TS-LDRs) have been proposed to solve MSPs without the stage-wise independence assumption. TS-LDRs are computationally tractable, but using a policy that is a linear function of past observations is typically not suitable for non-convex environments arising, for example, in energy systems. This paper introduces a novel approach, Two-Stage General Decision Rules (TS-GDR), to generalize the policy space beyond linear functions, making them suitable for non-convex environments. TS-GDR is a self-supervised learning algorithm that trains the nonlinear decision rules using stochastic gradient descent (SGD); its forward passes solve the policy implementation optimization problems, and the backward passes leverage duality theory to obtain closed-form gradients. The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-DDR). The method inherits the flexibility and computational performance of Deep Learning methodologies to solve SDMU problems generally tackled through large-scale optimization techniques. Applied to the Long-Term Hydrothermal Dispatch (LTHD) problem using actual power system data from Bolivia, the TS-DDR not only enhances solution quality but also significantly reduces computation times by several orders of magnitude.
Related papers
- A Simulation-Free Deep Learning Approach to Stochastic Optimal Control [12.699529713351287]
We propose a simulation-free algorithm for the solution of generic problems in optimal control (SOC)
Unlike existing methods, our approach does not require the solution of an adjoint problem.
arXiv Detail & Related papers (2024-10-07T16:16:53Z) - Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs [82.34567890576423]
We develop a deterministic policy gradient primal-dual method to find an optimal deterministic policy with non-asymptotic convergence.
We prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair.
To the best of our knowledge, this appears to be the first work that proposes a deterministic policy search method for continuous-space constrained MDPs.
arXiv Detail & Related papers (2024-08-19T14:11:04Z) - Optimal Sequential Decision-Making in Geosteering: A Reinforcement
Learning Approach [0.0]
Trajectory adjustment decisions throughout the drilling process, called geosteering, affect subsequent choices and information gathering.
We use the Deep Q-Network (DQN) method, a model-free reinforcement learning (RL) method that learns directly from the decision environment.
For two previously published synthetic geosteering scenarios, our results show that RL achieves high-quality outcomes comparable to the quasi-optimal ADP.
arXiv Detail & Related papers (2023-10-07T10:49:30Z) - Neural Stochastic Dual Dynamic Programming [99.80617899593526]
We introduce a trainable neural model that learns to map problem instances to a piece-wise linear value function.
$nu$-SDDP can significantly reduce problem solving cost without sacrificing solution quality.
arXiv Detail & Related papers (2021-12-01T22:55:23Z) - Solving Multistage Stochastic Linear Programming via Regularized Linear
Decision Rules: An Application to Hydrothermal Dispatch Planning [77.34726150561087]
We propose a novel regularization scheme for linear decision rules (LDR) based on the AdaSO (adaptive least absolute shrinkage and selection operator)
Experiments show that the overfit threat is non-negligible when using the classical non-regularized LDR to solve MSLP.
For the LHDP problem, our analysis highlights the following benefits of the proposed framework in comparison to the non-regularized benchmark.
arXiv Detail & Related papers (2021-10-07T02:36:14Z) - Optimal Operation of Power Systems with Energy Storage under
Uncertainty: A Scenario-based Method with Strategic Sampling [0.0]
Multi-period dynamics of energy storage (ES), intermittent renewable generation and uncontrollable power loads, make the optimization of power system operation (PSO) challenging.
A multi-period optimal PSO under uncertainty is formulated using the chance-constrained probability optimization (CCO) modeling paradigm.
This paper develops a novel solution method for this challenging CCO problem.
arXiv Detail & Related papers (2021-07-21T11:21:50Z) - Resource Allocation via Model-Free Deep Learning in Free Space Optical
Communications [119.81868223344173]
The paper investigates the general problem of resource allocation for mitigating channel fading effects in Free Space Optical (FSO) communications.
Under this framework, we propose two algorithms that solve FSO resource allocation problems.
arXiv Detail & Related papers (2020-07-27T17:38:51Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.