Bi-level Off-policy Reinforcement Learning for Volt/VAR Control
Involving Continuous and Discrete Devices
- URL: http://arxiv.org/abs/2104.05902v1
- Date: Tue, 13 Apr 2021 02:22:43 GMT
- Title: Bi-level Off-policy Reinforcement Learning for Volt/VAR Control
Involving Continuous and Discrete Devices
- Authors: Haotian Liu, Wenchuan Wu
- Abstract summary: In Volt/Var control, both slow timescale discrete devices (STDDs) and fast timescale continuous devices (FTCDs) are involved.
Traditional optimization methods are heavily based on accurate models of the system, but sometimes impractical because of their unaffordable effort on modelling.
In this paper, a novel bi-level off-policy reinforcement learning (RL) algorithm is proposed to solve this problem in a model-free manner.
- Score: 2.079959811127612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Volt/Var control (VVC) of active distribution networks(ADNs), both slow
timescale discrete devices (STDDs) and fast timescale continuous devices
(FTCDs) are involved. The STDDs such as on-load tap changers (OLTC) and FTCDs
such as distributed generators should be coordinated in time sequence. Such VCC
is formulated as a two-timescale optimization problem to jointly optimize FTCDs
and STDDs in ADNs. Traditional optimization methods are heavily based on
accurate models of the system, but sometimes impractical because of their
unaffordable effort on modelling. In this paper, a novel bi-level off-policy
reinforcement learning (RL) algorithm is proposed to solve this problem in a
model-free manner. A Bi-level Markov decision process (BMDP) is defined to
describe the two-timescale VVC problem and separate agents are set up for the
slow and fast timescale sub-problems. For the fast timescale sub-problem, we
adopt an off-policy RL method soft actor-critic with high sample efficiency.
For the slow one, we develop an off-policy multi-discrete soft actor-critic
(MDSAC) algorithm to address the curse of dimensionality with various STDDs. To
mitigate the non-stationary issue existing the two agents' learning processes,
we propose a multi-timescale off-policy correction (MTOPC) method by adopting
importance sampling technique. Comprehensive numerical studies not only
demonstrate that the proposed method can achieve stable and satisfactory
optimization of both STDDs and FTCDs without any model information, but also
support that the proposed method outperforms existing two-timescale VVC
methods.
Related papers
- Temporal Prototype-Aware Learning for Active Voltage Control on Power Distribution Networks [28.630650305620197]
Active Voltage Control (AVC) on the Power Distribution Networks (PDNs) aims to stabilize the voltage levels to ensure efficient and reliable operation of power systems.
We propose a novel temporal prototype-aware learning method, abbreviated as TPA, to learn time-adaptive dependencies under short-term training trajectories.
arXiv Detail & Related papers (2024-06-25T08:07:00Z) - Adaptive Multi-Scale Decomposition Framework for Time Series Forecasting [26.141054975797868]
We propose a novel Adaptive Multi-Scale Decomposition (AMD) framework for time series forecasting (TSF)
Our framework decomposes time series into distinct temporal patterns at multiple scales, leveraging the Multi-Scale Decomposable Mixing (MDM) block.
Our approach effectively models both temporal and channel dependencies and utilizes autocorrelation to refine multi-scale data integration.
arXiv Detail & Related papers (2024-06-06T05:27:33Z) - When to Sense and Control? A Time-adaptive Approach for Continuous-Time RL [37.58940726230092]
Reinforcement learning (RL) excels in optimizing policies for discrete-time Markov decision processes (MDP)
We formalize an RL framework, Time-adaptive Control & Sensing (TaCoS), that tackles this challenge.
We demonstrate that state-of-the-art RL algorithms trained on TaCoS drastically reduce the interaction amount over their discrete-time counterpart.
arXiv Detail & Related papers (2024-06-03T09:57:18Z) - Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains.
Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint.
This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions.
The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - Distribution-Aware Continual Test-Time Adaptation for Semantic Segmentation [33.75630514826721]
We propose a distribution-aware tuning ( DAT) method to make semantic segmentation CTTA efficient and practical in real-world applications.
DAT adaptively selects and updates two small groups of trainable parameters based on data distribution during the continual adaptation process.
We conduct experiments on two widely-used semantic segmentation CTTA benchmarks, achieving promising performance compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2023-09-24T10:48:20Z) - Gait Recognition in the Wild with Multi-hop Temporal Switch [81.35245014397759]
gait recognition in the wild is a more practical problem that has attracted the attention of the community of multimedia and computer vision.
This paper presents a novel multi-hop temporal switch method to achieve effective temporal modeling of gait patterns in real-world scenes.
arXiv Detail & Related papers (2022-09-01T10:46:09Z) - An Accelerated Doubly Stochastic Gradient Method with Faster Explicit
Model Identification [97.28167655721766]
We propose a novel doubly accelerated gradient descent (ADSGD) method for sparsity regularized loss minimization problems.
We first prove that ADSGD can achieve a linear convergence rate and lower overall computational complexity.
arXiv Detail & Related papers (2022-08-11T22:27:22Z) - Dynamic Network-Assisted D2D-Aided Coded Distributed Learning [59.29409589861241]
We propose a novel device-to-device (D2D)-aided coded federated learning method (D2D-CFL) for load balancing across devices.
We derive an optimal compression rate for achieving minimum processing time and establish its connection with the convergence time.
Our proposed method is beneficial for real-time collaborative applications, where the users continuously generate training data.
arXiv Detail & Related papers (2021-11-26T18:44:59Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - Online Multi-agent Reinforcement Learning for Decentralized
Inverter-based Volt-VAR Control [3.260913246106564]
The distributed Volt/Var control (VVC) methods have been widely studied for active distribution networks(ADNs)
We propose an online multi-agent reinforcement learning and decentralized control framework (OLDC) for VVC.
arXiv Detail & Related papers (2020-06-23T09:03:46Z) - An Online Method for A Class of Distributionally Robust Optimization
with Non-Convex Objectives [54.29001037565384]
We propose a practical online method for solving a class of online distributionally robust optimization (DRO) problems.
Our studies demonstrate important applications in machine learning for improving the robustness of networks.
arXiv Detail & Related papers (2020-06-17T20:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.