Deep Reinforcement Learning Algorithms for Option Hedging
- URL: http://arxiv.org/abs/2504.05521v2
- Date: Thu, 17 Apr 2025 00:36:34 GMT
- Title: Deep Reinforcement Learning Algorithms for Option Hedging
- Authors: Andrei Neagu, Frédéric Godin, Leila Kosseim,
- Abstract summary: We compare the performance of eight Deep Reinforcement Learning (DRL) algorithms in the context of dynamic hedging.<n>MCPG is the only algorithm to outperform the Black-Scholes delta hedge baseline with the allotted computational budget.
- Score: 0.20482269513546458
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dynamic hedging is a financial strategy that consists in periodically transacting one or multiple financial assets to offset the risk associated with a correlated liability. Deep Reinforcement Learning (DRL) algorithms have been used to find optimal solutions to dynamic hedging problems by framing them as sequential decision-making problems. However, most previous work assesses the performance of only one or two DRL algorithms, making an objective comparison across algorithms difficult. In this paper, we compare the performance of eight DRL algorithms in the context of dynamic hedging; Monte Carlo Policy Gradient (MCPG), Proximal Policy Optimization (PPO), along with four variants of Deep Q-Learning (DQL) and two variants of Deep Deterministic Policy Gradient (DDPG). Two of these variants represent a novel application to the task of dynamic hedging. In our experiments, we use the Black-Scholes delta hedge as a baseline and simulate the dataset using a GJR-GARCH(1,1) model. Results show that MCPG, followed by PPO, obtain the best performance in terms of the root semi-quadratic penalty. Moreover, MCPG is the only algorithm to outperform the Black-Scholes delta hedge baseline with the allotted computational budget, possibly due to the sparsity of rewards in our environment.
Related papers
- RL-finetuning LLMs from on- and off-policy data with a single algorithm [53.70731390624718]
We introduce a novel reinforcement learning algorithm (AGRO) for fine-tuning large-language models.<n>AGRO leverages the concept of generation consistency, which states that the optimal policy satisfies the notion of consistency across any possible generation of the model.<n>We derive algorithms that find optimal solutions via the sample-based policy gradient and provide theoretical guarantees on their convergence.
arXiv Detail & Related papers (2025-03-25T12:52:38Z) - Deep Reinforcement Learning for Online Optimal Execution Strategies [49.1574468325115]
This paper tackles the challenge of learning non-Markovian optimal execution strategies in dynamic financial markets.
We introduce a novel actor-critic algorithm based on Deep Deterministic Policy Gradient (DDPG)
We show that our algorithm successfully approximates the optimal execution strategy.
arXiv Detail & Related papers (2024-10-17T12:38:08Z) - Efficiently Training Deep-Learning Parametric Policies using Lagrangian Duality [55.06411438416805]
Constrained Markov Decision Processes (CMDPs) are critical in many high-stakes applications.<n>This paper introduces a novel approach, Two-Stage Deep Decision Rules (TS- DDR) to efficiently train parametric actor policies.<n>It is shown to enhance solution quality and to reduce computation times by several orders of magnitude when compared to current state-of-the-art methods.
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - Robust Lagrangian and Adversarial Policy Gradient for Robust Constrained Markov Decision Processes [5.167069404528051]
This paper introduces two algorithms, called RCPG with Robust Lagrangian and Adversarial RCPG.
RCPG with Robust Lagrangian modifies RCPG by taking the worst-case dynamics based on the Lagrangian rather than either the value or the constraint.
Adversarial RCPG also formulates the worst-case dynamics based on the Lagrangian but learns this directly and incrementally as an adversarial policy.
arXiv Detail & Related papers (2023-08-22T08:24:45Z) - Constrained Reinforcement Learning via Dissipative Saddle Flow Dynamics [5.270497591225775]
In constrained reinforcement learning (C-RL), an agent seeks to learn from the environment a policy that maximizes the expected cumulative reward.
Several algorithms rooted in sampled-based primal-dual methods have been recently proposed to solve this problem in policy space.
We propose a novel algorithm for constrained RL that does not suffer from these limitations.
arXiv Detail & Related papers (2022-12-03T01:54:55Z) - Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective
Reinforcement Learning [17.916366827429034]
We study policy optimization for Markov decision processes (MDPs) with multiple reward value functions.
We propose an Anchor-changing Regularized Natural Policy Gradient framework, which can incorporate ideas from well-performing first-order methods.
arXiv Detail & Related papers (2022-06-10T21:09:44Z) - Deep Reinforcement Learning for Stock Portfolio Optimization [0.0]
We will formulate the problem such that we can apply Reinforcement Learning for the task properly.
To maintain a realistic assumption about the market, we will incorporate transaction cost and risk factor into the state as well.
We will present the end-to-end solution for the task with Minimum Variance Portfolio for stock subset selection, and Wavelet Transform for extracting multi-frequency data pattern.
arXiv Detail & Related papers (2020-12-09T10:19:12Z) - Large-Scale Methods for Distributionally Robust Optimization [53.98643772533416]
We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters.
Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
arXiv Detail & Related papers (2020-10-12T17:41:44Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - Implementation Matters in Deep Policy Gradients: A Case Study on PPO and
TRPO [90.90009491366273]
We study the roots of algorithmic progress in deep policy gradient algorithms through a case study on two popular algorithms.
Specifically, we investigate the consequences of "code-level optimizations:"
Our results show that they (a) are responsible for most of PPO's gain in cumulative reward over TRPO, and (b) fundamentally change how RL methods function.
arXiv Detail & Related papers (2020-05-25T16:24:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.