Reinforcement Learning for Option Hedging: Static Implied-Volatility Fit versus Shortfall-Aware Performance
- URL: http://arxiv.org/abs/2601.01709v1
- Date: Mon, 05 Jan 2026 01:02:41 GMT
- Title: Reinforcement Learning for Option Hedging: Static Implied-Volatility Fit versus Shortfall-Aware Performance
- Authors: Ziheng Chen, Minxuan Hu, Jiayu Yi, Wenxi Sun,
- Abstract summary: We extend the Q-learner in Black-Scholes (QLBS) framework by incorporating risk aversion and trading costs.<n>We propose a novel Replication Learning of Option Pricing (RLOP) approach.
- Score: 7.793044742733676
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We extend the Q-learner in Black-Scholes (QLBS) framework by incorporating risk aversion and trading costs, and propose a novel Replication Learning of Option Pricing (RLOP) approach. Both methods are fully compatible with standard reinforcement learning algorithms and operate under market frictions. Using SPY and XOP option data, we evaluate performance along static and dynamic dimensions. Adaptive-QLBS achieves higher static pricing accuracy in implied volatility space, while RLOP delivers superior dynamic hedging performance by reducing shortfall probability. These results highlight the importance of evaluating option pricing models beyond static fit, emphasizing realized hedging outcomes.
Related papers
- Not All Preferences Are Created Equal: Stability-Aware and Gradient-Efficient Alignment for Reasoning Models [52.48582333951919]
We propose a dynamic framework designed to enhance alignment reliability by maximizing the Signal-to-Noise Ratio of policy updates.<n>SAGE (Stability-Aware Gradient Efficiency) integrates a coarse-grained curriculum mechanism that refreshes candidate pools based on model competence.<n> Experiments on multiple mathematical reasoning benchmarks demonstrate that SAGE significantly accelerates convergence and outperforms static baselines.
arXiv Detail & Related papers (2026-02-01T12:56:10Z) - Generative Actor Critic [74.04971271003869]
Generative Actor Critic (GAC) is a novel framework that decouples sequential decision-making by reframing textitpolicy evaluation as learning a generative model of the joint distribution over trajectories and returns.<n>Experiments on Gym-MuJoCo and Maze2D benchmarks demonstrate GAC's strong offline performance and significantly enhanced offline-to-online improvement compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-12-25T06:31:11Z) - Guardrailed Elasticity Pricing: A Churn-Aware Forecasting Playbook for Subscription Strategy [0.0]
This paper presents a marketing analytics framework that operationalizes subscription pricing as a dynamic, guardrailed decision system.<n>It blends seasonal time-series models with tree-based learners, runs Monte Carlo scenario tests to map risk envelopes, and solves a constrained optimization.<n>The framework functions as a strategy playbook that clarifies when to shift from flat to dynamic pricing, how to align pricing with CLV and MRR targets, and how to embed ethical guardrails.
arXiv Detail & Related papers (2025-12-24T04:25:31Z) - Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning [77.92320830700797]
Reinforcement Learning has played a central role in enabling reasoning capabilities of Large Language Models.<n>We propose a tractable computational framework that tracks and leverages curvature information during policy updates.<n>The algorithm, Curvature-Aware Policy Optimization (CAPO), identifies samples that contribute to unstable updates and masks them out.
arXiv Detail & Related papers (2025-10-01T12:29:32Z) - Refining Salience-Aware Sparse Fine-Tuning Strategies for Language Models [14.68920095399595]
sparsity-based PEFT (SPEFT) introduces trainable sparse adaptations to the weight matrices in the model.<n>We conduct the first systematic evaluation of salience metrics for SPEFT, inspired by zero-cost NAS proxies.<n>We compare static and dynamic masking strategies, finding that static masking, which predetermines non-zero entries before training, delivers efficiency without sacrificing performance.
arXiv Detail & Related papers (2024-12-18T04:14:35Z) - Enhancing Deep Hedging of Options with Implied Volatility Surface Feedback Information [0.0]
We present a dynamic hedging scheme for S&P 500 options, where rebalancing decisions are enhanced by integrating information about the implied volatility surface dynamics.<n>The optimal hedging strategy is obtained through a deep policy gradient-type reinforcement learning algorithm.
arXiv Detail & Related papers (2024-07-30T18:59:19Z) - UDUC: An Uncertainty-driven Approach for Learning-based Robust Control [9.76247882232402]
Probabilistic ensemble (PE) models offer a promising approach for modelling system dynamics.
PE models are susceptible to mode collapse, resulting in non-robust control when faced with environments slightly different from the training set.
We introduce the $textbfu$ncertainty-$textbfd$riven rob$textbfu$st $textbfc$ontrol (UDUC) loss as an alternative objective for training PE models.
arXiv Detail & Related papers (2024-05-04T07:48:59Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Applying Reinforcement Learning to Option Pricing and Hedging [0.0]
This thesis provides an overview of the recent advances in reinforcement learning in pricing and hedging financial instruments.
It bridges the traditional Black and Scholes (1973) model with novel artificial intelligence algorithms, enabling option pricing and hedging in a completely model-free and data-driven way.
arXiv Detail & Related papers (2023-10-06T15:59:12Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - SAMBA: Safe Model-Based & Active Reinforcement Learning [59.01424351231993]
SAMBA is a framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics.
We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations.
We provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
arXiv Detail & Related papers (2020-06-12T10:40:46Z) - A generative adversarial network approach to calibration of local
stochastic volatility models [2.1485350418225244]
We propose a fully data-driven approach to calibrate local volatility (LSV) models.
We parametrize the leverage function by a family of feed-forward neural networks and learn their parameters directly from the available market option prices.
This should be seen in the context of neural SDEs and (causal) generative adversarial networks.
arXiv Detail & Related papers (2020-05-05T21:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.