Related papers: Economic Battery Storage Dispatch with Deep Reinforcement Learning from Rule-Based Demonstrations

Economic Battery Storage Dispatch with Deep Reinforcement Learning from Rule-Based Demonstrations

URL: http://arxiv.org/abs/2504.04326v1
Date: Sun, 06 Apr 2025 02:16:42 GMT
Title: Economic Battery Storage Dispatch with Deep Reinforcement Learning from Rule-Based Demonstrations
Authors: Manuel Sage, Martin Staniszewski, Yaoyao Fiona Zhao,
Abstract summary: We propose an approach extending soft actor-critic (SAC) with learning from demonstrations.<n>We conduct a case study on a grid-connected microgrid and use if-then-else statements based on the wholesale price of electricity to collect demonstrations.
Score: 3.6117068575553595
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The application of deep reinforcement learning algorithms to economic battery dispatch problems has significantly increased recently. However, optimizing battery dispatch over long horizons can be challenging due to delayed rewards. In our experiments we observe poor performance of popular actor-critic algorithms when trained on yearly episodes with hourly resolution. To address this, we propose an approach extending soft actor-critic (SAC) with learning from demonstrations. The special feature of our approach is that, due to the absence of expert demonstrations, the demonstration data is generated through simple, rule-based policies. We conduct a case study on a grid-connected microgrid and use if-then-else statements based on the wholesale price of electricity to collect demonstrations. These are stored in a separate replay buffer and sampled with linearly decaying probability along with the agent's own experiences. Despite these minimal modifications and the imperfections in the demonstration data, the results show a drastic performance improvement regarding both sample efficiency and final rewards. We further show that the proposed method reliably outperforms the demonstrator and is robust to the choice of rule, as long as the rule is sufficient to guide early training into the right direction.

Related papers

Learning from Streaming Video with Orthogonal Gradients [62.51504086522027]
We address the challenge of representation learning from a continuous stream of video as input, in a self-supervised manner. This differs from the standard approaches to video learning where videos are chopped and shuffled during training in order to create a non-redundant batch. We demonstrate the drop in performance when moving from shuffled to sequential learning on three tasks.
arXiv Detail & Related papers (2025-04-02T17:59:57Z)
Random Representations Outperform Online Continually Learned Representations [68.42776779425978]
We show that existing online continually trained deep networks produce inferior representations compared to a simple pre-defined random transforms. Our method, called RanDumb, significantly outperforms state-of-the-art continually learned representations across all online continual learning benchmarks. Our study reveals the significant limitations of representation learning, particularly in low-exemplar and online continual learning scenarios.
arXiv Detail & Related papers (2024-02-13T22:07:29Z)
Policy Optimization with Smooth Guidance Learned from State-Only Demonstrations [2.709826237514737]
The sparsity of reward feedback remains a challenging problem in online deep reinforcement learning. We propose a simple and efficient algorithm called Policy Optimization with Smooth Guidance (POSG) We show POSG's significant advantages in control performance and convergence speed in four sparse-reward environments.
arXiv Detail & Related papers (2023-12-30T07:41:45Z)
Leveraging Demonstrations to Improve Online Learning: Quality Matters [54.98983862640944]
We show that the degree of improvement must depend on the quality of the demonstration data. We propose an informed TS algorithm that utilizes the demonstration data in a coherent way through Bayes' rule.
arXiv Detail & Related papers (2023-02-07T08:49:12Z)
Robustness of Demonstration-based Learning Under Limited Data Scenario [54.912936555876826]
Demonstration-based learning has shown great potential in stimulating pretrained language models' ability under limited data scenario. Why such demonstrations are beneficial for the learning process remains unclear since there is no explicit alignment between the demonstrations and the predictions. In this paper, we design pathological demonstrations by gradually removing intuitively useful information from the standard ones to take a deep dive of the robustness of demonstration-based sequence labeling.
arXiv Detail & Related papers (2022-10-19T16:15:04Z)
Bayesian Q-learning With Imperfect Expert Demonstrations [56.55609745121237]
We propose a novel algorithm to speed up Q-learning with the help of a limited amount of imperfect expert demonstrations. We evaluate our approach on a sparse-reward chain environment and six more complicated Atari games with delayed rewards.
arXiv Detail & Related papers (2022-10-01T17:38:19Z)
Evaluating the Effectiveness of Corrective Demonstrations and a Low-Cost Sensor for Dexterous Manipulation [0.5669790037378094]
Imitation learning is a promising approach to help robots acquire dexterous manipulation capabilities. We investigate characteristics of such additional demonstrations and their impact on performance. We show that inexpensive vision-based sensors, such as LeapMotion, can be used to dramatically reduce the cost of providing demonstrations.
arXiv Detail & Related papers (2022-04-15T19:55:46Z)
Reward Relabelling for combined Reinforcement and Imitation Learning on sparse-reward tasks [2.0305676256390934]
We present a new method to leverage demonstrations and episodes collected online in any sparse-reward environment with any off-policy algorithm. Our method is based on a reward bonus given to demonstrations and successful episodes, encouraging expert imitation and self-imitation. Our experiments focus on manipulation robotics, specifically on three tasks for a 6 degrees-of-freedom robotic arm in simulation.
arXiv Detail & Related papers (2022-01-11T08:35:18Z)
Learning from demonstrations with SACR2: Soft Actor-Critic with Reward Relabeling [2.1485350418225244]
Off-policy algorithms tend to be more sample-efficient, and can additionally benefit from any off-policy data stored in the replay buffer. Expert demonstrations are a popular source for such data. We present a new method, based on a reward bonus given to demonstrations and successful episodes.
arXiv Detail & Related papers (2021-10-27T14:30:29Z)
Robust Maximum Entropy Behavior Cloning [15.713997170792842]
Imitation learning (IL) algorithms use expert demonstrations to learn a specific task. Most of the existing approaches assume that all expert demonstrations are reliable and trustworthy, but what if there exist some adversarial demonstrations among the given data-set? We propose a novel general frame-work to directly generate a policy from demonstrations that autonomously detect the adversarial demonstrations and exclude them from the data set.
arXiv Detail & Related papers (2021-01-04T22:08:46Z)
DDPG++: Striving for Simplicity in Continuous-control Off-Policy Reinforcement Learning [95.60782037764928]
We show that simple Deterministic Policy Gradient works remarkably well as long as the overestimation bias is controlled. Second, we pinpoint training instabilities, typical of off-policy algorithms, to the greedy policy update step. Third, we show that ideas in the propensity estimation literature can be used to importance-sample transitions from replay buffer and update policy to prevent deterioration of performance.
arXiv Detail & Related papers (2020-06-26T20:21:12Z)
Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations [78.94386823185724]
Imitation learning learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations. In practice, collecting a sufficient amount of expert demonstrations can be prohibitively expensive. We propose Self-Adaptive Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations.
arXiv Detail & Related papers (2020-04-01T15:57:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.