Combined Peak Reduction and Self-Consumption Using Proximal Policy
Optimization
- URL: http://arxiv.org/abs/2211.14831v1
- Date: Sun, 27 Nov 2022 13:53:52 GMT
- Title: Combined Peak Reduction and Self-Consumption Using Proximal Policy
Optimization
- Authors: Thijs Peirelinck, Chris Hermans, Fred Spiessens, Geert Deconinck
- Abstract summary: Residential demand response programs aim to activate demand flexibility at the household level.
New RL algorithms, such as proximal policy optimisation (PPO), have tried to increase data efficiency.
We show our adapted version of PPO combined transfer learning, reduces cost by 14.51% compared to a regular controller.
- Score: 0.2867517731896504
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Residential demand response programs aim to activate demand flexibility at
the household level. In recent years, reinforcement learning (RL) has gained
significant attention for these type of applications. A major challenge of RL
algorithms is data efficiency. New RL algorithms, such as proximal policy
optimisation (PPO), have tried to increase data efficiency. Additionally,
combining RL with transfer learning has been proposed in an effort to mitigate
this challenge. In this work, we further improve upon state-of-the-art transfer
learning performance by incorporating demand response domain knowledge into the
learning pipeline. We evaluate our approach on a demand response use case where
peak shaving and self-consumption is incentivised by means of a capacity
tariff. We show our adapted version of PPO, combined with transfer learning,
reduces cost by 14.51% compared to a regular hysteresis controller and by 6.68%
compared to traditional PPO.
Related papers
- VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment [66.80143024475635]
We propose VinePPO, a straightforward approach to compute unbiased Monte Carlo-based estimates.
We show that VinePPO consistently outperforms PPO and other RL-free baselines across MATH and GSM8K datasets.
arXiv Detail & Related papers (2024-10-02T15:49:30Z) - Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning [67.95280175998792]
A novel adversarial imitation learning (GAIL)-powered policy learning approach is proposed for optimizing beamforming, spectrum allocation, and remote user equipment (RUE) association ins.
We employ inverse RL (IRL) to automatically learn reward functions without manual tuning.
We show that the proposed MA-AL method outperforms traditional RL approaches, achieving a $14.6%$ improvement in convergence and reward value.
arXiv Detail & Related papers (2024-09-27T13:05:02Z) - Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning [55.65738319966385]
We propose a novel online algorithm, iterative Nash policy optimization (INPO)
Unlike previous methods, INPO bypasses the need for estimating the expected win rate for individual responses.
With an LLaMA-3-8B-based SFT model, INPO achieves a 42.6% length-controlled win rate on AlpacaEval 2.0 and a 37.8% win rate on Arena-Hard.
arXiv Detail & Related papers (2024-06-30T08:00:34Z) - Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation [37.36913210031282]
Preference-based reinforcement learning (PbRL) has shown impressive capabilities in training agents without reward engineering.
We propose SEER, an efficient PbRL method that integrates label smoothing and policy regularization techniques.
arXiv Detail & Related papers (2024-05-29T01:49:20Z) - Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning [18.579378919155864]
We propose Adaptive $Q$Network (AdaQN) to take into account the non-stationarity of the optimization procedure without requiring additional samples.
AdaQN is theoretically sound and empirically validate it in MuJoCo control problems and Atari $2600 games.
arXiv Detail & Related papers (2024-05-25T11:57:43Z) - Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization [55.97310586039358]
Diffusion models have garnered widespread attention in Reinforcement Learning (RL) for their powerful expressiveness and multimodality.
We propose a novel model-free diffusion-based online RL algorithm, Q-weighted Variational Policy Optimization (QVPO)
Specifically, we introduce the Q-weighted variational loss, which can be proved to be a tight lower bound of the policy objective in online RL under certain conditions.
We also develop an efficient behavior policy to enhance sample efficiency by reducing the variance of the diffusion policy during online interactions.
arXiv Detail & Related papers (2024-05-25T10:45:46Z) - Hybrid Reinforcement Learning for Optimizing Pump Sustainability in
Real-World Water Distribution Networks [55.591662978280894]
This article addresses the pump-scheduling optimization problem to enhance real-time control of real-world water distribution networks (WDNs)
Our primary objectives are to adhere to physical operational constraints while reducing energy consumption and operational costs.
Traditional optimization techniques, such as evolution-based and genetic algorithms, often fall short due to their lack of convergence guarantees.
arXiv Detail & Related papers (2023-10-13T21:26:16Z) - Optimized cost function for demand response coordination of multiple EV
charging stations using reinforcement learning [6.37470346908743]
We build on previous research on RL, based on a Markov decision process (MDP) to simultaneously coordinate multiple charging stations.
We propose an improved cost function that essentially forces the learned control policy to always fulfill any charging demand that does not offer flexibility.
We rigorously compare the newly proposed batch RL fitted Q-iteration implementation with the original (costly) one, using real-world data.
arXiv Detail & Related papers (2022-03-03T11:22:27Z) - A Reinforcement Learning Approach to Parameter Selection for Distributed
Optimization in Power Systems [1.1199585259018459]
We develop an adaptive penalty parameter selection policy for the AC optimal power flow (ACOPF) problem solved via ADMM.
We show that our RL policy demonstrates promise for generalizability, performing well under unseen loading schemes as well as under unseen losses of lines and generators.
This work thus provides a proof-of-concept for using RL for parameter selection in ADMM for power systems applications.
arXiv Detail & Related papers (2021-10-22T18:17:32Z) - Efficient Deep Reinforcement Learning via Adaptive Policy Transfer [50.51637231309424]
Policy Transfer Framework (PTF) is proposed to accelerate Reinforcement Learning (RL)
Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it.
Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods.
arXiv Detail & Related papers (2020-02-19T07:30:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.