Related papers: Deep Hedging of Derivatives Using Reinforcement Learning

Deep Hedging of Derivatives Using Reinforcement Learning

URL: http://arxiv.org/abs/2103.16409v1
Date: Mon, 29 Mar 2021 07:43:30 GMT
Title: Deep Hedging of Derivatives Using Reinforcement Learning
Authors: Jay Cao, Jacky Chen, John Hull, Zissis Poulos
Abstract summary: We show how reinforcement learning can be used to derive optimal hedging strategies for derivatives when there are transaction costs. We find that a hybrid approach involving the use of an accounting P&L approach that incorporates a relatively simple valuation model works well.
Score: 0.3313576045747072
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This paper shows how reinforcement learning can be used to derive optimal hedging strategies for derivatives when there are transaction costs. The paper illustrates the approach by showing the difference between using delta hedging and optimal hedging for a short position in a call option when the objective is to minimize a function equal to the mean hedging cost plus a constant times the standard deviation of the hedging cost. Two situations are considered. In the first, the asset price follows a geometric Brownian motion. In the second, the asset price follows a stochastic volatility process. The paper extends the basic reinforcement learning approach in a number of ways. First, it uses two different Q-functions so that both the expected value of the cost and the expected value of the square of the cost are tracked for different state/action combinations. This approach increases the range of objective functions that can be used. Second, it uses a learning algorithm that allows for continuous state and action space. Third, it compares the accounting P&L approach (where the hedged position is valued at each step) and the cash flow approach (where cash inflows and outflows are used). We find that a hybrid approach involving the use of an accounting P&L approach that incorporates a relatively simple valuation model works well. The valuation model does not have to correspond to the process assumed for the underlying asset price.

Related papers

Distributed Sign Momentum with Local Steps for Training Transformers [21.046099659465508]
Pre-training Transformer models are resource-intensive. Recent studies have shown that sign momentum is an efficient technique for training large-scale deep learning models. This paper investigates a novel communication momentum with multiple broad steps.
arXiv Detail & Related papers (2024-11-26T20:31:11Z)
Policy Gradient with Active Importance Sampling [55.112959067035916]
Policy gradient (PG) methods significantly benefit from IS, enabling the effective reuse of previously collected samples. However, IS is employed in RL as a passive tool for re-weighting historical samples. We look for the best behavioral policy from which to collect samples to reduce the policy gradient variance.
arXiv Detail & Related papers (2024-05-09T09:08:09Z)
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM) We study a model within this domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary. We propose an algorithm namely robust contextual dueling bandits (RCDB), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z)
Onflow: an online portfolio allocation algorithm [0.0]
We introduce Onflow, a reinforcement learning technique that enables online optimization of portfolio allocation policies. For log-normal assets, the strategy learned by Onflow, with transaction costs at zero, mimics Markowitz's optimal portfolio. Onflow can remain efficient in regimes where other dynamical allocation techniques do not work anymore.
arXiv Detail & Related papers (2023-12-08T16:49:19Z)
Online non-parametric likelihood-ratio estimation by Pearson-divergence functional minimization [55.98760097296213]
We introduce a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t sim p, x'_t sim q)$ are observed over time. We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments.
arXiv Detail & Related papers (2023-11-03T13:20:11Z)
Autoregressive Bandits [58.46584210388307]
We propose a novel online learning setting, Autoregressive Bandits, in which the observed reward is governed by an autoregressive process of order $k$. We show that, under mild assumptions on the reward process, the optimal policy can be conveniently computed. We then devise a new optimistic regret minimization algorithm, namely, AutoRegressive Upper Confidence Bound (AR-UCB), that suffers sublinear regret of order $widetildemathcalO left( frac(k+1)3/2sqrtnT (1-G
arXiv Detail & Related papers (2022-12-12T21:37:36Z)
Instance-optimality in optimal value estimation: Adaptivity via variance-reduced Q-learning [99.34907092347733]
We analyze the problem of estimating optimal $Q$-value functions for a discounted Markov decision process with discrete states and actions. Using a local minimax framework, we show that this functional arises in lower bounds on the accuracy on any estimation procedure. In the other direction, we establish the sharpness of our lower bounds, up to factors logarithmic in the state and action spaces, by analyzing a variance-reduced version of $Q$-learning.
arXiv Detail & Related papers (2021-06-28T00:38:54Z)
Can we imitate stock price behavior to reinforcement learn option price? [7.771514118651335]
This paper presents a framework of imitating the price behavior of the underlying stock for reinforcement learning option price. We use accessible features of the equities pricing data to construct a non-deterministic Markov decision process. Our algorithm then maps imitative principal investor's decisions to simulated stock price paths by a Bayesian deep neural network.
arXiv Detail & Related papers (2021-05-24T16:08:58Z)
Deep Reinforcement Learning for Stock Portfolio Optimization [0.0]
We will formulate the problem such that we can apply Reinforcement Learning for the task properly. To maintain a realistic assumption about the market, we will incorporate transaction cost and risk factor into the state as well. We will present the end-to-end solution for the task with Minimum Variance Portfolio for stock subset selection, and Wavelet Transform for extracting multi-frequency data pattern.
arXiv Detail & Related papers (2020-12-09T10:19:12Z)
Transfer Learning via $\ell_1$ Regularization [9.442139459221785]
We propose a method for transferring knowledge from a source domain to a target domain. Our method yields sparsity for both the estimates themselves and changes of the estimates. Empirical results demonstrate that the proposed method effectively balances stability and plasticity.
arXiv Detail & Related papers (2020-06-26T07:42:03Z)
Provably Efficient Safe Exploration via Primal-Dual Policy Optimization [105.7510838453122]
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation. We present an provably efficient online policy optimization algorithm for CMDP with safe exploration in the function approximation setting.
arXiv Detail & Related papers (2020-03-01T17:47:03Z)
Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation [49.502277468627035]
This paper studies the statistical theory of batch data reinforcement learning with function approximation. Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history.
arXiv Detail & Related papers (2020-02-21T19:20:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.