Stochastic Dynamic Power Dispatch with High Generalization and Few-Shot
Adaption via Contextual Meta Graph Reinforcement Learning
- URL: http://arxiv.org/abs/2401.12235v1
- Date: Fri, 19 Jan 2024 13:58:46 GMT
- Title: Stochastic Dynamic Power Dispatch with High Generalization and Few-Shot
Adaption via Contextual Meta Graph Reinforcement Learning
- Authors: Bairong Deng, Tao Yu, Zhenning Pan, Xuehan Zhang, Yufeng Wu, Qiaoyi
Ding
- Abstract summary: A novel contextual meta graph reinforcement learning (Meta-GRL) for a highly generalized multi-stage optimal dispatch policy is proposed.
An upper meta-learner is proposed to encode context for different dispatch scenarios and learn how to achieve dispatch task identification while the lower policy learner learns context-specified dispatch policy.
After sufficient offline learning, this approach can rapidly adapt to unseen and undefined scenarios with only a few updations of the hypothesis judgments generated by the meta-learner.
- Score: 7.251065697936476
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning is an emerging approaches to facilitate multi-stage
sequential decision-making problems. This paper studies a real-time multi-stage
stochastic power dispatch considering multivariate uncertainties. Current
researches suffer from low generalization and practicality, that is, the
learned dispatch policy can only handle a specific dispatch scenario, its
performance degrades significantly if actual samples and training samples are
inconsistent. To fill these gaps, a novel contextual meta graph reinforcement
learning (Meta-GRL) for a highly generalized multi-stage optimal dispatch
policy is proposed. Specifically, a more general contextual Markov decision
process (MDP) and scalable graph representation are introduced to achieve a
more generalized multi-stage stochastic power dispatch modeling. An upper
meta-learner is proposed to encode context for different dispatch scenarios and
learn how to achieve dispatch task identification while the lower policy
learner learns context-specified dispatch policy. After sufficient offline
learning, this approach can rapidly adapt to unseen and undefined scenarios
with only a few updations of the hypothesis judgments generated by the
meta-learner. Numerical comparisons with state-of-the-art policies and
traditional reinforcement learning verify the optimality, efficiency,
adaptability, and scalability of the proposed Meta-GRL.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [94.2860766709971]
We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a wireless network with statistically-identical agents.
Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies.
arXiv Detail & Related papers (2024-04-04T06:24:11Z) - Sequential Knockoffs for Variable Selection in Reinforcement Learning [19.925653053430395]
We introduce the notion of a minimal sufficient state in a Markov decision process (MDP)
We propose a novel SEquEntial Knockoffs (SEEK) algorithm that estimates the minimal sufficient state in a system with high-dimensional complex nonlinear dynamics.
arXiv Detail & Related papers (2023-03-24T21:39:06Z) - Gradient-Regulated Meta-Prompt Learning for Generalizable
Vision-Language Models [137.74524357614285]
We introduce a novel Gradient-RegulAted Meta-prompt learning framework.
It helps pre-training models adapt to downstream tasks in a parameter -- and data -- efficient way.
GRAM can be easily incorporated into various prompt tuning methods in a model-agnostic way.
arXiv Detail & Related papers (2023-03-12T05:03:37Z) - Diffusion Policies as an Expressive Policy Class for Offline
Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset.
We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy.
We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z) - Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via
Online High-Confidence Change-Point Detection [7.685002911021767]
We introduce an algorithm that efficiently learns policies in non-stationary environments.
It analyzes a possibly infinite stream of data and computes, in real-time, high-confidence change-point detection statistics.
We show that (i) this algorithm minimizes the delay until unforeseen changes to a context are detected, thereby allowing for rapid responses.
arXiv Detail & Related papers (2021-05-20T01:57:52Z) - Model-based Meta Reinforcement Learning using Graph Structured Surrogate
Models [40.08137765886609]
We show that our model, called a graph structured surrogate model (GSSM), outperforms state-of-the-art methods in predicting environment dynamics.
Our approach is able to obtain high returns, while allowing fast execution during deployment by avoiding test time policy gradient optimization.
arXiv Detail & Related papers (2021-02-16T17:21:55Z) - Dif-MAML: Decentralized Multi-Agent Meta-Learning [54.39661018886268]
We propose a cooperative multi-agent meta-learning algorithm, referred to as MAML or Dif-MAML.
We show that the proposed strategy allows a collection of agents to attain agreement at a linear rate and to converge to a stationary point of the aggregate MAML.
Simulation results illustrate the theoretical findings and the superior performance relative to the traditional non-cooperative setting.
arXiv Detail & Related papers (2020-10-06T16:51:09Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z) - Contextual Policy Transfer in Reinforcement Learning Domains via Deep
Mixtures-of-Experts [24.489002406693128]
We introduce a novel mixture-of-experts formulation for learning state-dependent beliefs over source task dynamics.
We show how this model can be incorporated into standard policy reuse frameworks.
arXiv Detail & Related papers (2020-02-29T07:58:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.