Related papers: DNN-based Policies for Stochastic AC OPF

DNN-based Policies for Stochastic AC OPF

URL: http://arxiv.org/abs/2112.02441v1
Date: Sat, 4 Dec 2021 22:26:27 GMT
Title: DNN-based Policies for Stochastic AC OPF
Authors: Sarthak Gupta, Sidhant Misra, Deepjyoti Deka, Vassilis Kekatos
Abstract summary: optimal power flow (SOPF) formulations provide a mechanism to handle uncertainties by computing dispatch decisions and control policies that maintain feasibility under uncertainty. We put forth a deep neural network (DNN)-based policy that predicts the generator dispatch decisions in response to uncertainty. The advantages of the DNN policy over simpler policies and their efficacy in enforcing safety limits and producing near optimal solutions are demonstrated.
Score: 7.551130027327462
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A prominent challenge to the safe and optimal operation of the modern power grid arises due to growing uncertainties in loads and renewables. Stochastic optimal power flow (SOPF) formulations provide a mechanism to handle these uncertainties by computing dispatch decisions and control policies that maintain feasibility under uncertainty. Most SOPF formulations consider simple control policies such as affine policies that are mathematically simple and resemble many policies used in current practice. Motivated by the efficacy of machine learning (ML) algorithms and the potential benefits of general control policies for cost and constraint enforcement, we put forth a deep neural network (DNN)-based policy that predicts the generator dispatch decisions in real time in response to uncertainty. The weights of the DNN are learnt using stochastic primal-dual updates that solve the SOPF without the need for prior generation of training labels and can explicitly account for the feasibility constraints in the SOPF. The advantages of the DNN policy over simpler policies and their efficacy in enforcing safety limits and producing near optimal solutions are demonstrated in the context of a chance constrained formulation on a number of test cases.

Related papers

Learning Deterministic Policies with Policy Gradients in Constrained Markov Decision Processes [59.27926064817273]
We introduce an exploration-agnostic algorithm, called C-PG, which enjoys global last-iterate convergence guarantees under domination assumptions.<n>We empirically validate both the action-based (C-PGAE) and parameter-based (C-PGPE) variants of C-PG on constrained control tasks.
arXiv Detail & Related papers (2025-06-06T10:29:05Z)
Efficient Safety Alignment of Large Language Models via Preference Re-ranking and Representation-based Reward Modeling [84.00480999255628]
Reinforcement Learning algorithms for safety alignment of Large Language Models (LLMs) encounter the challenge of distribution shift.<n>Current approaches typically address this issue through online sampling from the target policy.<n>We propose a new framework that leverages the model's intrinsic safety judgment capability to extract reward signals.
arXiv Detail & Related papers (2025-03-13T06:40:34Z)
Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions. We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z)
Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states. The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z)
Penalized Proximal Policy Optimization for Safe Reinforcement Learning [68.86485583981866]
We propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem. P3O utilizes a simple-yet-effective penalty function to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective. We show that P3O outperforms state-of-the-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.
arXiv Detail & Related papers (2022-05-24T06:15:51Z)
Randomized Policy Optimization for Optimal Stopping [0.0]
We propose a new methodology for optimal stopping based on randomized linear policies. We show that our approach can substantially outperform state-of-the-art methods.
arXiv Detail & Related papers (2022-03-25T04:33:15Z)
Learning Stochastic Parametric Differentiable Predictive Control Policies [2.042924346801313]
We present a scalable alternative called parametric differentiable predictive control (SP-DPC) for unsupervised learning of neural control policies. SP-DPC is formulated as a deterministic approximation to the parametric constrained optimal control problem. We provide theoretical probabilistic guarantees for policies learned via the SP-DPC method on closed-loop constraints and chance satisfaction.
arXiv Detail & Related papers (2022-03-02T22:46:32Z)
Neural-Progressive Hedging: Enforcing Constraints in Reinforcement Learning with Stochastic Programming [8.942831966541231]
We propose a framework that leverages programming during the online phase of executing a reinforcement learning (RL) policy. The goal is to ensure feasibility with respect to constraints and risk-based objectives such as conditional value-at-risk (CVaR) We show that the NP framework produces policies that are better than deep RL and other baseline approaches.
arXiv Detail & Related papers (2022-02-27T19:39:19Z)
A Prescriptive Dirichlet Power Allocation Policy with Deep Reinforcement Learning [6.003234406806134]
In this work, we propose the Dirichlet policy for continuous allocation tasks and analyze the bias and variance of its policy gradients. We demonstrate that the Dirichlet policy is bias-free and provides significantly faster convergence and better performance than the Gaussian-softmax policy. The experimental results show the potential to prescribe optimal operation, improve the efficiency and sustainability of multi-power source systems.
arXiv Detail & Related papers (2022-01-20T20:41:04Z)
Certification of Iterative Predictions in Bayesian Neural Networks [79.15007746660211]
We compute lower bounds for the probability that trajectories of the BNN model reach a given set of states while avoiding a set of unsafe states. We use the lower bounds in the context of control and reinforcement learning to provide safety certification for given control policies.
arXiv Detail & Related papers (2021-05-21T05:23:57Z)
CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee [61.176159046544946]
In safe reinforcement learning (SRL) problems, an agent explores the environment to maximize an expected total reward and avoids violation of certain constraints. This is the first-time analysis of SRL algorithms with global optimal policies.
arXiv Detail & Related papers (2020-11-11T16:05:14Z)
Chance-Constrained Control with Lexicographic Deep Reinforcement Learning [77.34726150561087]
This paper proposes a lexicographic Deep Reinforcement Learning (DeepRL)-based approach to chance-constrained Markov Decision Processes. A lexicographic version of the well-known DeepRL algorithm DQN is also proposed and validated via simulations.
arXiv Detail & Related papers (2020-10-19T13:09:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.