Model Predictive Control-Guided Reinforcement Learning for Implicit Balancing
- URL: http://arxiv.org/abs/2510.04868v1
- Date: Mon, 06 Oct 2025 14:52:27 GMT
- Title: Model Predictive Control-Guided Reinforcement Learning for Implicit Balancing
- Authors: Seyed Soroush Karimi Madahi, Kenneth Bruninx, Bert Claessens, Chris Develder,
- Abstract summary: In Europe, profit-seeking balance responsible parties can deviate in real time from their day-ahead nominations to assist transmission system operators.<n>This paper proposes an MPC-guided RL method that combines the complementary strengths of both MPC and RL.<n>The performance of the proposed method is evaluated on the implicit balancing battery control problem using Belgian balancing data from 2023.
- Score: 2.6288470934623636
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In Europe, profit-seeking balance responsible parties can deviate in real time from their day-ahead nominations to assist transmission system operators in maintaining the supply-demand balance. Model predictive control (MPC) strategies to exploit these implicit balancing strategies capture arbitrage opportunities, but fail to accurately capture the price-formation process in the European imbalance markets and face high computational costs. Model-free reinforcement learning (RL) methods are fast to execute, but require data-intensive training and usually rely on real-time and historical data for decision-making. This paper proposes an MPC-guided RL method that combines the complementary strengths of both MPC and RL. The proposed method can effectively incorporate forecasts into the decision-making process (as in MPC), while maintaining the fast inference capability of RL. The performance of the proposed method is evaluated on the implicit balancing battery control problem using Belgian balancing data from 2023. First, we analyze the performance of the standalone state-of-the-art RL and MPC methods from various angles, to highlight their individual strengths and limitations. Next, we show an arbitrage profit benefit of the proposed MPC-guided RL method of 16.15% and 54.36%, compared to standalone RL and MPC.
Related papers
- $φ$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models [58.217707070069885]
This paper presents a novel Fairness Direct Preference Optimization (FaiDPO or $$-DPO) framework for continual learning in LMMs.<n>We first propose a new continual learning paradigm based on Direct Preference Optimization (DPO) to mitigate catastrophic forgetting by aligning learning with pairwise preference signals.<n> Extensive experiments and ablation studies show the proposed $$-DPO achieves State-of-the-Art performance across multiple benchmarks.
arXiv Detail & Related papers (2026-02-26T04:14:33Z) - Exploratory Mean-Variance with Jumps: An Equilibrium Approach [3.9270182903783706]
We model the market dynamics with a jump-diffusion process and apply Reinforcement Learning techniques.<n>Our numerical study on 24 years of real market data shows that the proposed RL model is profitable in 13 out of 14 tests.
arXiv Detail & Related papers (2025-12-10T01:12:22Z) - BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping [69.74252624161652]
We propose BAlanced Policy Optimization with Adaptive Clipping (BAPO)<n>BAPO dynamically adjusts clipping bounds to adaptively re-balance positive and negative contributions, preserve entropy, and stabilize RL optimization.<n>On AIME 2024 and AIME 2025 benchmarks, our 7B BAPO model surpasses open-source counterparts such as SkyWork-OR1-7B.
arXiv Detail & Related papers (2025-10-21T12:55:04Z) - Feature-driven reinforcement learning for photovoltaic in continuous intraday trading [8.952724019926189]
We propose a feature-driven reinforcement learning (RL) approach for PV intraday trading.<n>RL integrates data-driven features into the state and learns bidding policies in a sequential decision framework.<n>We show that RL offers a practical, data-efficient, and operationally deployable pathway for active intraday participation by PV producers.
arXiv Detail & Related papers (2025-10-15T15:19:05Z) - Towards High Data Efficiency in Reinforcement Learning with Verifiable Reward [54.708851958671794]
We propose a Data-Efficient Policy Optimization pipeline that combines optimized strategies for both offline and online data selection.<n>In offline phase, we curate a high-quality subset of training samples based on diversity, influence, and appropriate difficulty.<n>During online RLVR training, we introduce a sample-level explorability metric to dynamically filter samples with low exploration potential.
arXiv Detail & Related papers (2025-09-01T10:04:20Z) - Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs [51.21041884010009]
Ring-lite is a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL)<n>Our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks.
arXiv Detail & Related papers (2025-06-17T17:12:34Z) - Accelerating RL for LLM Reasoning with Optimal Advantage Regression [52.0792918455501]
We propose a novel two-stage policy optimization framework that directly approximates the optimal advantage function.<n>$A$*-PO achieves competitive performance across a wide range of mathematical reasoning benchmarks.<n>It reduces training time by up to 2$times$ and peak memory usage by over 30% compared to PPO, GRPO, and REBEL.
arXiv Detail & Related papers (2025-05-27T03:58:50Z) - Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance [52.65461207786633]
Policy-based Reinforcement Learning from Human Feedback is essential for aligning large language models with human preferences.<n>It requires joint training of an actor and critic with a pretrained, fixed reward model for guidance.<n>We propose textbfDecoupled Value Policy Optimization (DVPO), a lean framework that replaces traditional reward modeling with a pretrained emphglobal value model (GVM)
arXiv Detail & Related papers (2025-02-24T08:11:33Z) - Control Policy Correction Framework for Reinforcement Learning-based Energy Arbitrage Strategies [4.950434218152639]
We propose a new RL-based control framework for batteries to obtain a safe energy arbitrage strategy in the imbalance settlement mechanism.
We use the Belgian imbalance price of 2023 to evaluate the performance of our proposed framework.
arXiv Detail & Related papers (2024-04-29T16:03:21Z) - Distributional Reinforcement Learning-based Energy Arbitrage Strategies
in Imbalance Settlement Mechanism [6.520803851931361]
Growth in the penetration of renewable energy sources makes supply more uncertain and leads to an increase in the system imbalance.
We propose a battery control framework based on distributional reinforcement learning (DRL)
Our proposed control framework takes a risk-sensitive perspective, allowing BRPs to adjust their risk preferences.
arXiv Detail & Related papers (2023-12-23T15:38:31Z) - Deep Reinforcement Learning Approach for Trading Automation in The Stock
Market [0.0]
This paper presents a model to generate profitable trades in the stock market using Deep Reinforcement Learning (DRL) algorithms.
We formulate the trading problem as a Partially Observed Markov Decision Process (POMDP) model, considering the constraints imposed by the stock market.
We then solve the formulated POMDP problem using the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm reporting a 2.68 Sharpe Ratio on unseen data set.
arXiv Detail & Related papers (2022-07-05T11:34:29Z) - Optimized cost function for demand response coordination of multiple EV
charging stations using reinforcement learning [6.37470346908743]
We build on previous research on RL, based on a Markov decision process (MDP) to simultaneously coordinate multiple charging stations.
We propose an improved cost function that essentially forces the learned control policy to always fulfill any charging demand that does not offer flexibility.
We rigorously compare the newly proposed batch RL fitted Q-iteration implementation with the original (costly) one, using real-world data.
arXiv Detail & Related papers (2022-03-03T11:22:27Z) - On Effective Scheduling of Model-based Reinforcement Learning [53.027698625496015]
We propose a framework named AutoMBPO to automatically schedule the real data ratio.
In this paper, we first theoretically analyze the role of real data in policy training, which suggests that gradually increasing the ratio of real data yields better performance.
arXiv Detail & Related papers (2021-11-16T15:24:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.