Heterogeneous Multi-Agent Proximal Policy Optimization for Power Distribution System Restoration
- URL: http://arxiv.org/abs/2511.14730v4
- Date: Wed, 26 Nov 2025 03:18:02 GMT
- Title: Heterogeneous Multi-Agent Proximal Policy Optimization for Power Distribution System Restoration
- Authors: Parya Dolatyabi, Ali Farajzadeh Bavil, Mahdi Khodayar,
- Abstract summary: This paper applies a Heterogeneous-Agent Reinforcement Learning framework to enable coordinated restoration across interconnected microgrids.<n>Results demonstrate that incorporating microgrid-level heterogeneity within the HARL framework yields a scalable, stable, and constraint-aware solution for complex PDS restoration.
- Score: 4.46185759083096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Restoring power distribution systems (PDS) after large-scale outages requires sequential switching operations that reconfigure feeder topology and coordinate distributed energy resources (DERs) under nonlinear constraints such as power balance, voltage limits, and thermal ratings. These challenges make conventional optimization and value-based RL approaches computationally inefficient and difficult to scale. This paper applies a Heterogeneous-Agent Reinforcement Learning (HARL) framework, instantiated through Heterogeneous-Agent Proximal Policy Optimization (HAPPO), to enable coordinated restoration across interconnected microgrids. Each agent controls a distinct microgrid with different loads, DER capacities, and switch counts, introducing practical structural heterogeneity. Decentralized actor policies are trained with a centralized critic to compute advantage values for stable on-policy updates. A physics-informed OpenDSS environment provides full power flow feedback and enforces operational limits via differentiable penalty signals rather than invalid action masking. The total DER generation is capped at 2400 kW, and each microgrid must satisfy local supply-demand feasibility. Experiments on the IEEE 123-bus and IEEE 8500-node systems show that HAPPO achieves faster convergence, higher restored power, and smoother multi-seed training than DQN, PPO, MAES, MAGDPG, MADQN, Mean-Field RL, and QMIX. Results demonstrate that incorporating microgrid-level heterogeneity within the HARL framework yields a scalable, stable, and constraint-aware solution for complex PDS restoration.
Related papers
- Decentralized Spatial Reuse Optimization in Wi-Fi: An Internal Regret Minimization Approach [40.02689778290504]
This paper introduces a decentralized learning algorithm based on regret-matching.<n>Internal regret minimization guides competing agents toward Correlated Equilibria (CE), effectively mimicking coordination without explicit communication.<n>Results confirm the not-yet-unleashed potential of scalable decentralized solutions.
arXiv Detail & Related papers (2026-02-09T10:10:18Z) - Shielded Controller Units for RL with Operational Constraints Applied to Remote Microgrids [50.64533198075622]
Reinforcement learning (RL) is a powerful framework for optimizing decision-making in complex systems under uncertainty.<n>In this paper, we introduce Shielded Controller Units (SCUs), a systematic and interpretable approach that leverages prior knowledge of system dynamics.<n>We demonstrate the effectiveness of SCUs on a remote microgrid optimization task with strict operational requirements.
arXiv Detail & Related papers (2025-11-30T19:28:34Z) - PowerGrow: Feasible Co-Growth of Structures and Dynamics for Power Grid Synthesis [75.14189839277928]
We present PowerGrow, a co-generative framework that significantly reduces computational overhead while maintaining operational validity.<n> Experiments across benchmark settings show that PowerGrow outperforms prior diffusion models in fidelity and diversity.<n>This demonstrates its ability to generate operationally valid and realistic power grid scenarios.
arXiv Detail & Related papers (2025-08-29T01:47:27Z) - GEPO: Group Expectation Policy Optimization for Stable Heterogeneous Reinforcement Learning [43.46954951944727]
We propose HeteroRL, a heterogeneous RL architecture that decouples parameter learning and rollout sampling.<n>The core component is Group Expectation Policy Optimization (GEPO), an asynchronous RL algorithm robust to latency.<n> Experiments show GEPO achieves superior stability - only a 3% performance drop from online to 1800s latency.
arXiv Detail & Related papers (2025-08-25T09:57:35Z) - Grid-Agent: An LLM-Powered Multi-Agent System for Power Grid Control [4.3210078529580045]
This paper introduces Grid-Agent, an autonomous AI-driven framework to detect and remediate grid violations.<n>Grid-Agent integrates semantic reasoning with numerical precision through modular agents.<n>Experiments on IEEE and CIGRE benchmark networks demonstrate superior mitigation performance.
arXiv Detail & Related papers (2025-08-07T01:10:28Z) - Heterogeneous Group-Based Reinforcement Learning for LLM-based Multi-Agent Systems [25.882461853973897]
We propose Multi-Agent Heterogeneous Group Policy Optimization (MHGPO), which guides policy updates by estimating relative reward advantages.<n>MHGPO eliminates the need for Critic networks, enhancing stability and reducing computational overhead.<n>We also introduce three group rollout sampling strategies that trade off between efficiency and effectiveness.
arXiv Detail & Related papers (2025-06-03T10:17:19Z) - Augmented Lagrangian-Based Safe Reinforcement Learning Approach for Distribution System Volt/VAR Control [1.1059341532498634]
This paper formulates the Volt- VAR control problem as a constrained Markov decision process (CMDP)
A novel safe off-policy reinforcement learning (RL) approach is proposed in this paper to solve the CMDP.
A two-stage strategy is adopted for offline training and online execution, so the accurate distribution system model is no longer needed.
arXiv Detail & Related papers (2024-10-19T19:45:09Z) - REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.<n>In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.<n>We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - Distributed-Training-and-Execution Multi-Agent Reinforcement Learning
for Power Control in HetNet [48.96004919910818]
We propose a multi-agent deep reinforcement learning (MADRL) based power control scheme for the HetNet.
To promote cooperation among agents, we develop a penalty-based Q learning (PQL) algorithm for MADRL systems.
In this way, an agent's policy can be learned by other agents more easily, resulting in a more efficient collaboration process.
arXiv Detail & Related papers (2022-12-15T17:01:56Z) - Collaborative Intelligent Reflecting Surface Networks with Multi-Agent
Reinforcement Learning [63.83425382922157]
Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks.
In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting.
arXiv Detail & Related papers (2022-03-26T20:37:14Z) - Adaptive Stochastic ADMM for Decentralized Reinforcement Learning in
Edge Industrial IoT [106.83952081124195]
Reinforcement learning (RL) has been widely investigated and shown to be a promising solution for decision-making and optimal control processes.
We propose an adaptive ADMM (asI-ADMM) algorithm and apply it to decentralized RL with edge-computing-empowered IIoT networks.
Experiment results show that our proposed algorithms outperform the state of the art in terms of communication costs and scalability, and can well adapt to complex IoT environments.
arXiv Detail & Related papers (2021-06-30T16:49:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.