Exploratory Mean-Variance with Jumps: An Equilibrium Approach
- URL: http://arxiv.org/abs/2512.09224v1
- Date: Wed, 10 Dec 2025 01:12:22 GMT
- Title: Exploratory Mean-Variance with Jumps: An Equilibrium Approach
- Authors: Yuling Max Chen, Bin Li, David Saunders,
- Abstract summary: We model the market dynamics with a jump-diffusion process and apply Reinforcement Learning techniques.<n>Our numerical study on 24 years of real market data shows that the proposed RL model is profitable in 13 out of 14 tests.
- Score: 3.9270182903783706
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Revisiting the continuous-time Mean-Variance (MV) Portfolio Optimization problem, we model the market dynamics with a jump-diffusion process and apply Reinforcement Learning (RL) techniques to facilitate informed exploration within the control space. We recognize the time-inconsistency of the MV problem and adopt the time-inconsistent control (TIC) approach to analytically solve for an exploratory equilibrium investment policy, which is a Gaussian distribution centered on the equilibrium control of the classical MV problem. Our approach accounts for time-inconsistent preferences and actions, and our equilibrium policy is the best option an investor can take at any given time during the investment period. Moreover, we leverage the martingale properties of the equilibrium policy, design a RL model, and propose an Actor-Critic RL algorithm. All of our RL model parameters converge to the corresponding true values in a simulation study. Our numerical study on 24 years of real market data shows that the proposed RL model is profitable in 13 out of 14 tests, demonstrating its practical applicability in real world investment.
Related papers
- Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning [88.42566960813438]
CalibRL is a hybrid-policy RLVR framework that supports controllable exploration with expert guidance.<n>CalibRL increases policy entropy in a guided manner and clarifies the target distribution.<n>Experiments across eight benchmarks, including both in-domain and out-of-domain settings, demonstrate consistent improvements.
arXiv Detail & Related papers (2026-02-22T07:23:36Z) - BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping [69.74252624161652]
We propose BAlanced Policy Optimization with Adaptive Clipping (BAPO)<n>BAPO dynamically adjusts clipping bounds to adaptively re-balance positive and negative contributions, preserve entropy, and stabilize RL optimization.<n>On AIME 2024 and AIME 2025 benchmarks, our 7B BAPO model surpasses open-source counterparts such as SkyWork-OR1-7B.
arXiv Detail & Related papers (2025-10-21T12:55:04Z) - Model Predictive Control-Guided Reinforcement Learning for Implicit Balancing [2.6288470934623636]
In Europe, profit-seeking balance responsible parties can deviate in real time from their day-ahead nominations to assist transmission system operators.<n>This paper proposes an MPC-guided RL method that combines the complementary strengths of both MPC and RL.<n>The performance of the proposed method is evaluated on the implicit balancing battery control problem using Belgian balancing data from 2023.
arXiv Detail & Related papers (2025-10-06T14:52:27Z) - Continuous-Time Reinforcement Learning for Asset-Liability Management [0.0]
This paper proposes a novel approach for Asset-Liability Management (ALM) by employing continuous-time Reinforcement Learning (RL)<n>We develop a model-free, policy gradient-based soft actor-critic algorithm tailored to ALM for dynamically synchronizing assets and liabilities.<n>Our empirical study evaluates this approach against two enhanced traditional financial strategies, a model-based continuous-time RL method, and three state-of-the-art RL algorithms.
arXiv Detail & Related papers (2025-09-27T12:36:51Z) - Your AI, Not Your View: The Bias of LLMs in Investment Analysis [62.388554963415906]
In finance, Large Language Models (LLMs) face frequent knowledge conflicts arising from discrepancies between their pre-trained parametric knowledge and real-time market data.<n>These conflicts are especially problematic in real-world investment services, where a model's inherent biases can misalign with institutional objectives.<n>We propose an experimental framework to investigate emergent behaviors in such conflict scenarios, offering a quantitative analysis of bias in investment analysis.
arXiv Detail & Related papers (2025-07-28T16:09:38Z) - Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs [51.21041884010009]
Ring-lite is a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL)<n>Our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks.
arXiv Detail & Related papers (2025-06-17T17:12:34Z) - Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning [55.36978389831446]
We recast reflective exploration within the Bayes-Adaptive RL framework.<n>Our resulting algorithm, BARL, instructs the LLM to stitch and switch strategies based on observed outcomes.
arXiv Detail & Related papers (2025-05-26T22:51:00Z) - Exploratory Mean-Variance Portfolio Optimization with Regime-Switching Market Dynamics [3.6149777601911097]
We study a regime-switching market setting and apply reinforcement learning techniques to assist informed exploration within the control space.<n>In a real market data study, EMVRS with OC learning outperforms its counterparts with the highest mean and reasonably low volatility of the annualized portfolio returns.
arXiv Detail & Related papers (2025-01-28T02:48:41Z) - Reinforcement Learning in High-frequency Market Making [7.740207107300432]
This paper establishes a new and comprehensive theoretical analysis for the application of reinforcement learning (RL) in high-frequency market making.
We bridge the modern RL theory and the continuous-time statistical models in high-frequency financial economics.
arXiv Detail & Related papers (2024-07-14T22:07:48Z) - Data-Driven Merton's Strategies via Policy Randomization [11.774563966512709]
We study Merton's expected utility problem in an incomplete market.<n>Agent under consideration is a price taker who has access only to the stock and factor value processes.<n>We propose an auxiliary problem in which the agent can invoke policy randomization according to a specific class of distributions.
arXiv Detail & Related papers (2023-12-19T02:14:13Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Towards Standardizing Reinforcement Learning Approaches for Stochastic
Production Scheduling [77.34726150561087]
reinforcement learning can be used to solve scheduling problems.
Existing studies rely on (sometimes) complex simulations for which the code is unavailable.
There is a vast array of RL designs to choose from.
standardization of model descriptions - both production setup and RL design - and validation scheme are a prerequisite.
arXiv Detail & Related papers (2021-04-16T16:07:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.