Solving Multistage Stochastic Linear Programming via Regularized Linear
Decision Rules: An Application to Hydrothermal Dispatch Planning
- URL: http://arxiv.org/abs/2110.03146v1
- Date: Thu, 7 Oct 2021 02:36:14 GMT
- Title: Solving Multistage Stochastic Linear Programming via Regularized Linear
Decision Rules: An Application to Hydrothermal Dispatch Planning
- Authors: Felipe Nazare and Alexandre Street
- Abstract summary: We propose a novel regularization scheme for linear decision rules (LDR) based on the AdaSO (adaptive least absolute shrinkage and selection operator)
Experiments show that the overfit threat is non-negligible when using the classical non-regularized LDR to solve MSLP.
For the LHDP problem, our analysis highlights the following benefits of the proposed framework in comparison to the non-regularized benchmark.
- Score: 77.34726150561087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The solution of multistage stochastic linear problems (MSLP) represents a
challenge for many applications. Long-term hydrothermal dispatch planning
(LHDP) materializes this challenge in a real-world problem that affects
electricity markets, economies, and natural resources worldwide. No closed-form
solutions are available for MSLP and the definition of non-anticipative
policies with high-quality out-of-sample performance is crucial. Linear
decision rules (LDR) provide an interesting simulation-based framework for
finding high-quality policies to MSLP through two-stage stochastic models. In
practical applications, however, the number of parameters to be estimated when
using an LDR may be close or higher than the number of scenarios, thereby
generating an in-sample overfit and poor performances in out-of-sample
simulations. In this paper, we propose a novel regularization scheme for LDR
based on the AdaLASSO (adaptive least absolute shrinkage and selection
operator). The goal is to use the parsimony principle as largely studied in
high-dimensional linear regression models to obtain better out-of-sample
performance for an LDR applied to MSLP. Computational experiments show that the
overfit threat is non-negligible when using the classical non-regularized LDR
to solve MSLP. For the LHDP problem, our analysis highlights the following
benefits of the proposed framework in comparison to the non-regularized
benchmark: 1) significant reductions in the number of non-zero coefficients
(model parsimony), 2) substantial cost reductions in out-of-sample evaluations,
and 3) improved spot-price profiles.
Related papers
- Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.
To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.
Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains.
Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint.
This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions.
The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - A Moreau Envelope Approach for LQR Meta-Policy Estimation [0.7311194870168775]
We study the problem of policy estimation for the Linear Quadratic Regulator (LQR) in discrete-time linear time-invariant uncertain dynamical systems.
We propose a surrogate LQR cost, built from a finite set of realizations of the uncertain system, to define a meta-policy efficiently adjustable to new realizations.
arXiv Detail & Related papers (2024-03-26T04:02:09Z) - Value-Biased Maximum Likelihood Estimation for Model-based Reinforcement
Learning in Discounted Linear MDPs [16.006893624836554]
We propose to solve linear MDPs through the lens of Value-Biased Maximum Likelihood Estimation (VBMLE)
VBMLE is computationally more efficient as it only requires solving one optimization problem in each time step.
In our regret analysis, we offer a generic convergence result of MLE in linear MDPs through a novel supermartingale construct.
arXiv Detail & Related papers (2023-10-17T18:27:27Z) - Provably Efficient Algorithm for Nonstationary Low-Rank MDPs [48.92657638730582]
We make the first effort to investigate nonstationary RL under episodic low-rank MDPs, where both transition kernels and rewards may vary over time.
We propose a parameter-dependent policy optimization algorithm called PORTAL, and further improve PORTAL to its parameter-free version of Ada-PORTAL.
For both algorithms, we provide upper bounds on the average dynamic suboptimality gap, which show that as long as the nonstationarity is not significantly large, PORTAL and Ada-PORTAL are sample-efficient and can achieve arbitrarily small average dynamic suboptimality gap with sample complexity.
arXiv Detail & Related papers (2023-08-10T09:52:44Z) - PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback [106.63518036538163]
We present a novel unified bilevel optimization-based framework, textsfPARL, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning.
Our framework addressed these concerns by explicitly parameterizing the distribution of the upper alignment objective (reward design) by the lower optimal variable.
Our empirical results substantiate that the proposed textsfPARL can address the alignment concerns in RL by showing significant improvements.
arXiv Detail & Related papers (2023-08-03T18:03:44Z) - Revisiting the Linear-Programming Framework for Offline RL with General
Function Approximation [24.577243536475233]
offline reinforcement learning (RL) concerns pursuing an optimal policy for sequential decision-making from a pre-collected dataset.
Recent theoretical progress has focused on developing sample-efficient offline RL algorithms with various relaxed assumptions on data coverage and function approximators.
We revisit the linear-programming framework for offline RL, and advance the existing results in several aspects.
arXiv Detail & Related papers (2022-12-28T15:28:12Z) - A general sample complexity analysis of vanilla policy gradient [101.16957584135767]
Policy gradient (PG) is one of the most popular reinforcement learning (RL) problems.
"vanilla" theoretical understanding of PG trajectory is one of the most popular methods for solving RL problems.
arXiv Detail & Related papers (2021-07-23T19:38:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.