Distributional Reinforcement Learning-based Energy Arbitrage Strategies
in Imbalance Settlement Mechanism
- URL: http://arxiv.org/abs/2401.00015v1
- Date: Sat, 23 Dec 2023 15:38:31 GMT
- Title: Distributional Reinforcement Learning-based Energy Arbitrage Strategies
in Imbalance Settlement Mechanism
- Authors: Seyed Soroush Karimi Madahi, Bert Claessens, Chris Develder
- Abstract summary: Growth in the penetration of renewable energy sources makes supply more uncertain and leads to an increase in the system imbalance.
We propose a battery control framework based on distributional reinforcement learning (DRL)
Our proposed control framework takes a risk-sensitive perspective, allowing BRPs to adjust their risk preferences.
- Score: 6.520803851931361
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Growth in the penetration of renewable energy sources makes supply more
uncertain and leads to an increase in the system imbalance. This trend,
together with the single imbalance pricing, opens an opportunity for balance
responsible parties (BRPs) to perform energy arbitrage in the imbalance
settlement mechanism. To this end, we propose a battery control framework based
on distributional reinforcement learning (DRL). Our proposed control framework
takes a risk-sensitive perspective, allowing BRPs to adjust their risk
preferences: we aim to optimize a weighted sum of the arbitrage profit and a
risk measure while constraining the daily number of cycles for the battery. We
assess the performance of our proposed control framework using the Belgian
imbalance prices of 2022 and compare two state-of-the-art RL methods, deep Q
learning and soft actor-critic. Results reveal that the distributional soft
actor-critic method can outperform other methods. Moreover, we note that our
fully risk-averse agent appropriately learns to hedge against the risk related
to the unknown imbalance price by (dis)charging the battery only when the agent
is more certain about the price.
Related papers
- ODAR: Principled Adaptive Routing for LLM Reasoning via Active Inference [60.958331943869126]
ODAR-Expert is an adaptive routing framework that optimize the accuracy-efficiency trade-off via principled resource allocation.<n>We show strong and consistent gains, including 98.2% accuracy on MATH and 54.8% on Humanity's Last Exam.
arXiv Detail & Related papers (2026-02-27T05:22:01Z) - A Replicate-and-Quantize Strategy for Plug-and-Play Load Balancing of Sparse Mixture-of-Experts LLMs [64.8510381475827]
Sparse Mixture-of-Experts (SMoE) architectures are increasingly used to scale large language models efficiently.<n>SMoE models often suffer from severe load imbalance across experts, where a small subset of experts receives most tokens while others are underutilized.<n>We present a systematic analysis of expert routing during inference and identify three findings: (i) load imbalance persists and worsens with larger batch sizes, (ii) selection frequency does not reliably reflect expert importance, and (iii) overall expert workload and importance can be estimated using a small calibration set.
arXiv Detail & Related papers (2026-02-23T15:11:16Z) - Conformal Thinking: Risk Control for Reasoning on a Compute Budget [60.65072883773352]
Reasoning Large Language Models (LLMs) enable test-time scaling, with dataset-level accuracy improving as the token budget increases.<n>We re-frame the budget setting problem as risk control, limiting the error rate while minimizing compute.<n>Our framework introduces an upper threshold that stops reasoning when the model is confident and a novel lower threshold that preemptively stops unsolvable instances.
arXiv Detail & Related papers (2026-02-03T18:17:22Z) - Robust Reinforcement Learning in Finance: Modeling Market Impact with Elliptic Uncertainty Sets [57.179679246370114]
In financial applications, reinforcement learning (RL) agents are commonly trained on historical data, where their actions do not influence prices.<n>During deployment, these agents trade in live markets where their own transactions can shift asset prices, a phenomenon known as market impact.<n>Traditional robust RL approaches address this model misspecification by optimizing the worst-case performance over a set of uncertainties.<n>We develop a novel class of elliptic uncertainty sets, enabling efficient and tractable robust policy evaluation.
arXiv Detail & Related papers (2025-10-22T18:22:25Z) - BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping [69.74252624161652]
We propose BAlanced Policy Optimization with Adaptive Clipping (BAPO)<n>BAPO dynamically adjusts clipping bounds to adaptively re-balance positive and negative contributions, preserve entropy, and stabilize RL optimization.<n>On AIME 2024 and AIME 2025 benchmarks, our 7B BAPO model surpasses open-source counterparts such as SkyWork-OR1-7B.
arXiv Detail & Related papers (2025-10-21T12:55:04Z) - Model Predictive Control-Guided Reinforcement Learning for Implicit Balancing [2.6288470934623636]
In Europe, profit-seeking balance responsible parties can deviate in real time from their day-ahead nominations to assist transmission system operators.<n>This paper proposes an MPC-guided RL method that combines the complementary strengths of both MPC and RL.<n>The performance of the proposed method is evaluated on the implicit balancing battery control problem using Belgian balancing data from 2023.
arXiv Detail & Related papers (2025-10-06T14:52:27Z) - No-Regret Learning Under Adversarial Resource Constraints: A Spending Plan Is All You Need! [56.80767500991973]
We focus on two canonical settings: $(i)$ online resource allocation where rewards and costs are observed before action selection, and $(ii)$ online learning with resource constraints where they are observed after action selection, under full feedback or bandit feedback.<n>It is well known that achieving sublinear regret in these settings is impossible when reward and cost distributions may change arbitrarily over time.<n>We design general (primal-)dual methods that achieve sublinear regret with respect to baselines that follow the spending plan. Crucially, the performance of our algorithms improves when the spending plan ensures a well-balanced distribution of the budget
arXiv Detail & Related papers (2025-06-16T08:42:31Z) - Dynamic Reinsurance Treaty Bidding via Multi-Agent Reinforcement Learning [0.0]
This paper develops a novel multi-agent reinforcement learning (MARL) framework for reinsurance treaty bidding.<n>MARL agents achieve up to 15% higher underwriting profit, 20% lower tail risk, and over 25% improvement in Sharpe ratios.<n>These findings suggest that MARL offers a viable path toward more transparent, adaptive, and risk-sensitive reinsurance markets.
arXiv Detail & Related papers (2025-06-16T05:43:22Z) - Adaptive Insurance Reserving with CVaR-Constrained Reinforcement Learning under Macroeconomic Regimes [0.0]
This paper proposes a reinforcement learning (RL) framework for insurance reserving that integrates tail-risk sensitivity, macroeconomic regime modeling, and regulatory compliance.
The framework also accommodates fixed-shock stress testing and regime-stratified analysis, providing a principled and principled approach to reserving under uncertainty.
arXiv Detail & Related papers (2025-04-13T01:43:25Z) - Predicting and Publishing Accurate Imbalance Prices Using Monte Carlo Tree Search [4.950434218152639]
We propose a Monte Carlo Tree Search method that publishes accurate imbalance prices while accounting for potential response actions.
Our approach models the system dynamics using a neural network forecaster and a cluster of virtual batteries controlled by reinforcement learning agents.
arXiv Detail & Related papers (2024-11-06T15:49:28Z) - Control Policy Correction Framework for Reinforcement Learning-based Energy Arbitrage Strategies [4.950434218152639]
We propose a new RL-based control framework for batteries to obtain a safe energy arbitrage strategy in the imbalance settlement mechanism.
We use the Belgian imbalance price of 2023 to evaluate the performance of our proposed framework.
arXiv Detail & Related papers (2024-04-29T16:03:21Z) - Probabilistic forecasting of power system imbalance using neural network-based ensembles [4.573008040057806]
We propose an ensemble of C-VSNs, which are our adaptation of variable selection networks (VSNs)
Each minute, our model predicts the imbalance of the current and upcoming two quarter-hours, along with uncertainty estimations on these forecasts.
For high imbalance magnitude situations, our model outperforms the state-of-the-art by 23.4%.
arXiv Detail & Related papers (2024-04-23T08:42:35Z) - Risk-Sensitive RL with Optimized Certainty Equivalents via Reduction to
Standard RL [48.1726560631463]
We study Risk-Sensitive Reinforcement Learning with the Optimized Certainty Equivalent (OCE) risk.
We propose two general meta-algorithms via reductions to standard RL.
We show that it learns the optimal risk-sensitive policy while prior algorithms provably fail.
arXiv Detail & Related papers (2024-03-10T21:45:12Z) - WARM: On the Benefits of Weight Averaged Reward Models [63.08179139233774]
We propose Weight Averaged Reward Models (WARM) to mitigate reward hacking.
Experiments on summarization tasks, using best-of-N and RL methods, shows that WARM improves the overall quality and alignment of LLM predictions.
arXiv Detail & Related papers (2024-01-22T18:27:08Z) - Deep Reinforcement Learning for Community Battery Scheduling under
Uncertainties of Load, PV Generation, and Energy Prices [5.694872363688119]
This paper presents a deep reinforcement learning (RL) strategy to schedule a community battery system in the presence of uncertainties.
We position the community battery to play a versatile role, in integrating local PV energy, reducing peak load, and exploiting energy price fluctuations for arbitrage.
arXiv Detail & Related papers (2023-12-04T13:45:17Z) - Risk-Controlling Model Selection via Guided Bayesian Optimization [35.53469358591976]
We find a configuration that adheres to user-specified limits on certain risks while being useful with respect to other conflicting metrics.
Our method identifies a set of optimal configurations residing in a designated region of interest.
We demonstrate the effectiveness of our approach on a range of tasks with multiple desiderata, including low error rates, equitable predictions, handling spurious correlations, managing rate and distortion in generative models, and reducing computational costs.
arXiv Detail & Related papers (2023-12-04T07:29:44Z) - Safe Deployment for Counterfactual Learning to Rank with Exposure-Based
Risk Minimization [63.93275508300137]
We introduce a novel risk-aware Counterfactual Learning To Rank method with theoretical guarantees for safe deployment.
Our experimental results demonstrate the efficacy of our proposed method, which is effective at avoiding initial periods of bad performance when little data is available.
arXiv Detail & Related papers (2023-04-26T15:54:23Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - Monotonic Improvement Guarantees under Non-stationarity for
Decentralized PPO [66.5384483339413]
We present a new monotonic improvement guarantee for optimizing decentralized policies in cooperative Multi-Agent Reinforcement Learning (MARL)
We show that a trust region constraint can be effectively enforced in a principled way by bounding independent ratios based on the number of agents in training.
arXiv Detail & Related papers (2022-01-31T20:39:48Z) - Off-policy Reinforcement Learning with Optimistic Exploration and
Distribution Correction [73.77593805292194]
We train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework.
To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training.
arXiv Detail & Related papers (2021-10-22T22:07:51Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.