Action Set Based Policy Optimization for Safe Power Grid Management
- URL: http://arxiv.org/abs/2106.15200v1
- Date: Tue, 29 Jun 2021 09:36:36 GMT
- Title: Action Set Based Policy Optimization for Safe Power Grid Management
- Authors: Bo Zhou, Hongsheng Zeng, Yuecheng Liu, Kejiao Li, Fan Wang, Hao Tian
- Abstract summary: Reinforcement learning (RL) has been employed to provide sequential decision-making in power grid management.
We propose a novel method for this problem, which builds on top of the search-based planning algorithm.
In NeurIPS 2020 Learning to Run Power Network (L2RPN) competition, our solution safely managed the power grid and ranked first in both tracks.
- Score: 8.156111849078439
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Maintaining the stability of the modern power grid is becoming increasingly
difficult due to fluctuating power consumption, unstable power supply coming
from renewable energies, and unpredictable accidents such as man-made and
natural disasters. As the operation on the power grid must consider its impact
on future stability, reinforcement learning (RL) has been employed to provide
sequential decision-making in power grid management. However, existing methods
have not considered the environmental constraints. As a result, the learned
policy has risk of selecting actions that violate the constraints in
emergencies, which will escalate the issue of overloaded power lines and lead
to large-scale blackouts. In this work, we propose a novel method for this
problem, which builds on top of the search-based planning algorithm. At the
planning stage, the search space is limited to the action set produced by the
policy. The selected action strictly follows the constraints by testing its
outcome with the simulation function provided by the system. At the learning
stage, to address the problem that gradients cannot be propagated to the
policy, we introduce Evolutionary Strategies (ES) with black-box policy
optimization to improve the policy directly, maximizing the returns of the long
run. In NeurIPS 2020 Learning to Run Power Network (L2RPN) competition, our
solution safely managed the power grid and ranked first in both tracks.
Related papers
- Optimizing Load Scheduling in Power Grids Using Reinforcement Learning and Markov Decision Processes [0.0]
This paper proposes a reinforcement learning (RL) approach to address the challenges of dynamic load scheduling.
Our results show that the RL-based method provides a robust and scalable solution for real-time load scheduling.
arXiv Detail & Related papers (2024-10-23T09:16:22Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - Best of Both Worlds in Online Control: Competitive Ratio and Policy
Regret [61.59646565655169]
We show that several recently proposed online control algorithms achieve the best of both worlds: sublinear regret vs. the best DAC policy selected in hindsight.
We conclude that sublinear regret vs. the optimal competitive policy is attainable when the linear dynamical system is unknown.
arXiv Detail & Related papers (2022-11-21T07:29:08Z) - A Prescriptive Dirichlet Power Allocation Policy with Deep Reinforcement
Learning [6.003234406806134]
In this work, we propose the Dirichlet policy for continuous allocation tasks and analyze the bias and variance of its policy gradients.
We demonstrate that the Dirichlet policy is bias-free and provides significantly faster convergence and better performance than the Gaussian-softmax policy.
The experimental results show the potential to prescribe optimal operation, improve the efficiency and sustainability of multi-power source systems.
arXiv Detail & Related papers (2022-01-20T20:41:04Z) - DNN-based Policies for Stochastic AC OPF [7.551130027327462]
optimal power flow (SOPF) formulations provide a mechanism to handle uncertainties by computing dispatch decisions and control policies that maintain feasibility under uncertainty.
We put forth a deep neural network (DNN)-based policy that predicts the generator dispatch decisions in response to uncertainty.
The advantages of the DNN policy over simpler policies and their efficacy in enforcing safety limits and producing near optimal solutions are demonstrated.
arXiv Detail & Related papers (2021-12-04T22:26:27Z) - Enforcing Policy Feasibility Constraints through Differentiable
Projection for Energy Optimization [57.88118988775461]
We propose PROjected Feasibility (PROF) to enforce convex operational constraints within neural policies.
We demonstrate PROF on two applications: energy-efficient building operation and inverter control.
arXiv Detail & Related papers (2021-05-19T01:58:10Z) - Non-stationary Online Learning with Memory and Non-stochastic Control [71.14503310914799]
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions.
In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments.
We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length.
arXiv Detail & Related papers (2021-02-07T09:45:15Z) - Delayed Q-update: A novel credit assignment technique for deriving an
optimal operation policy for the Grid-Connected Microgrid [3.3754780158324564]
We propose an approach for deriving a desirable microgrid operation policy using the proposed novel credit assignment technique, delayed-Q update.
The technique employs novel features such as the ability to tackle and resolve the delayed effective property of the microgrid.
It supports the search for a near-optimal operation policy under a sophisticatedly controlled microgrid environment.
arXiv Detail & Related papers (2020-06-30T10:30:15Z) - Off-policy Learning for Remote Electrical Tilt Optimization [68.8204255655161]
We address the problem of Remote Electrical Tilt (RET) optimization using off-policy Contextual Multi-Armed-Bandit (CMAB) techniques.
We propose CMAB learning algorithms to extract optimal tilt update policies from the data.
Our policies show consistent improvements over the rule-based logging policy used to collect the data.
arXiv Detail & Related papers (2020-05-21T11:30:31Z) - Accelerating Deep Reinforcement Learning With the Aid of Partial Model:
Energy-Efficient Predictive Video Streaming [97.75330397207742]
Predictive power allocation is conceived for energy-efficient video streaming over mobile networks using deep reinforcement learning.
To handle the continuous state and action spaces, we resort to deep deterministic policy gradient (DDPG) algorithm.
Our simulation results show that the proposed policies converge to the optimal policy that is derived based on perfect large-scale channel prediction.
arXiv Detail & Related papers (2020-03-21T17:36:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.