Bi-level Off-policy Reinforcement Learning for Volt/VAR Control
Involving Continuous and Discrete Devices
- URL: http://arxiv.org/abs/2104.05902v1
- Date: Tue, 13 Apr 2021 02:22:43 GMT
- Title: Bi-level Off-policy Reinforcement Learning for Volt/VAR Control
Involving Continuous and Discrete Devices
- Authors: Haotian Liu, Wenchuan Wu
- Abstract summary: In Volt/Var control, both slow timescale discrete devices (STDDs) and fast timescale continuous devices (FTCDs) are involved.
Traditional optimization methods are heavily based on accurate models of the system, but sometimes impractical because of their unaffordable effort on modelling.
In this paper, a novel bi-level off-policy reinforcement learning (RL) algorithm is proposed to solve this problem in a model-free manner.
- Score: 2.079959811127612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Volt/Var control (VVC) of active distribution networks(ADNs), both slow
timescale discrete devices (STDDs) and fast timescale continuous devices
(FTCDs) are involved. The STDDs such as on-load tap changers (OLTC) and FTCDs
such as distributed generators should be coordinated in time sequence. Such VCC
is formulated as a two-timescale optimization problem to jointly optimize FTCDs
and STDDs in ADNs. Traditional optimization methods are heavily based on
accurate models of the system, but sometimes impractical because of their
unaffordable effort on modelling. In this paper, a novel bi-level off-policy
reinforcement learning (RL) algorithm is proposed to solve this problem in a
model-free manner. A Bi-level Markov decision process (BMDP) is defined to
describe the two-timescale VVC problem and separate agents are set up for the
slow and fast timescale sub-problems. For the fast timescale sub-problem, we
adopt an off-policy RL method soft actor-critic with high sample efficiency.
For the slow one, we develop an off-policy multi-discrete soft actor-critic
(MDSAC) algorithm to address the curse of dimensionality with various STDDs. To
mitigate the non-stationary issue existing the two agents' learning processes,
we propose a multi-timescale off-policy correction (MTOPC) method by adopting
importance sampling technique. Comprehensive numerical studies not only
demonstrate that the proposed method can achieve stable and satisfactory
optimization of both STDDs and FTCDs without any model information, but also
support that the proposed method outperforms existing two-timescale VVC
methods.
Related papers
- Joint Transmit and Pinching Beamforming for PASS: Optimization-Based or Learning-Based? [89.05848771674773]
A novel antenna system ()-enabled downlink multi-user multiple-input single-output (MISO) framework is proposed.
It consists of multiple waveguides, which equip numerous low-cost antennas, named (PAs)
The positions of PAs can be reconfigured to both spanning large-scale path and space.
arXiv Detail & Related papers (2025-02-12T18:54:10Z) - Fast T2T: Optimization Consistency Speeds Up Diffusion-Based Training-to-Testing Solving for Combinatorial Optimization [83.65278205301576]
We propose to learn direct mappings from different noise levels to the optimal solution for a given instance, facilitating high-quality generation with minimal shots.
This is achieved through an optimization consistency training protocol, which minimizes the difference among samples.
Experiments on two popular tasks, the Traveling Salesman Problem (TSP) and Maximal Independent Set (MIS), demonstrate the superiority of Fast T2T regarding both solution quality and efficiency.
arXiv Detail & Related papers (2025-02-05T07:13:43Z) - Two-Timescale Model Caching and Resource Allocation for Edge-Enabled AI-Generated Content Services [55.0337199834612]
Generative AI (GenAI) has emerged as a transformative technology, enabling customized and personalized AI-generated content (AIGC) services.
These services require executing GenAI models with billions of parameters, posing significant obstacles to resource-limited wireless edge.
We introduce the formulation of joint model caching and resource allocation for AIGC services to balance a trade-off between AIGC quality and latency metrics.
arXiv Detail & Related papers (2024-11-03T07:01:13Z) - Temporal Prototype-Aware Learning for Active Voltage Control on Power Distribution Networks [28.630650305620197]
Active Voltage Control (AVC) on the Power Distribution Networks (PDNs) aims to stabilize the voltage levels to ensure efficient and reliable operation of power systems.
We propose a novel temporal prototype-aware learning method, abbreviated as TPA, to learn time-adaptive dependencies under short-term training trajectories.
arXiv Detail & Related papers (2024-06-25T08:07:00Z) - Adaptive Multi-Scale Decomposition Framework for Time Series Forecasting [26.141054975797868]
We propose a novel Adaptive Multi-Scale Decomposition (AMD) framework for time series forecasting (TSF)
Our framework decomposes time series into distinct temporal patterns at multiple scales, leveraging the Multi-Scale Decomposable Mixing (MDM) block.
Our approach effectively models both temporal and channel dependencies and utilizes autocorrelation to refine multi-scale data integration.
arXiv Detail & Related papers (2024-06-06T05:27:33Z) - When to Sense and Control? A Time-adaptive Approach for Continuous-Time RL [37.58940726230092]
Reinforcement learning (RL) excels in optimizing policies for discrete-time Markov decision processes (MDP)
We formalize an RL framework, Time-adaptive Control & Sensing (TaCoS), that tackles this challenge.
We demonstrate that state-of-the-art RL algorithms trained on TaCoS drastically reduce the interaction amount over their discrete-time counterpart.
arXiv Detail & Related papers (2024-06-03T09:57:18Z) - An Accelerated Doubly Stochastic Gradient Method with Faster Explicit
Model Identification [97.28167655721766]
We propose a novel doubly accelerated gradient descent (ADSGD) method for sparsity regularized loss minimization problems.
We first prove that ADSGD can achieve a linear convergence rate and lower overall computational complexity.
arXiv Detail & Related papers (2022-08-11T22:27:22Z) - Dynamic Network-Assisted D2D-Aided Coded Distributed Learning [59.29409589861241]
We propose a novel device-to-device (D2D)-aided coded federated learning method (D2D-CFL) for load balancing across devices.
We derive an optimal compression rate for achieving minimum processing time and establish its connection with the convergence time.
Our proposed method is beneficial for real-time collaborative applications, where the users continuously generate training data.
arXiv Detail & Related papers (2021-11-26T18:44:59Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.