Deep Hedging of Derivatives Using Reinforcement Learning
- URL: http://arxiv.org/abs/2103.16409v1
- Date: Mon, 29 Mar 2021 07:43:30 GMT
- Title: Deep Hedging of Derivatives Using Reinforcement Learning
- Authors: Jay Cao, Jacky Chen, John Hull, Zissis Poulos
- Abstract summary: We show how reinforcement learning can be used to derive optimal hedging strategies for derivatives when there are transaction costs.
We find that a hybrid approach involving the use of an accounting P&L approach that incorporates a relatively simple valuation model works well.
- Score: 0.3313576045747072
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper shows how reinforcement learning can be used to derive optimal
hedging strategies for derivatives when there are transaction costs. The paper
illustrates the approach by showing the difference between using delta hedging
and optimal hedging for a short position in a call option when the objective is
to minimize a function equal to the mean hedging cost plus a constant times the
standard deviation of the hedging cost. Two situations are considered. In the
first, the asset price follows a geometric Brownian motion. In the second, the
asset price follows a stochastic volatility process. The paper extends the
basic reinforcement learning approach in a number of ways. First, it uses two
different Q-functions so that both the expected value of the cost and the
expected value of the square of the cost are tracked for different state/action
combinations. This approach increases the range of objective functions that can
be used. Second, it uses a learning algorithm that allows for continuous state
and action space. Third, it compares the accounting P&L approach (where the
hedged position is valued at each step) and the cash flow approach (where cash
inflows and outflows are used). We find that a hybrid approach involving the
use of an accounting P&L approach that incorporates a relatively simple
valuation model works well. The valuation model does not have to correspond to
the process assumed for the underlying asset price.
Related papers
- Policy Gradient with Active Importance Sampling [55.112959067035916]
Policy gradient (PG) methods significantly benefit from IS, enabling the effective reuse of previously collected samples.
However, IS is employed in RL as a passive tool for re-weighting historical samples.
We look for the best behavioral policy from which to collect samples to reduce the policy gradient variance.
arXiv Detail & Related papers (2024-05-09T09:08:09Z) - Onflow: an online portfolio allocation algorithm [0.0]
We introduce Onflow, a reinforcement learning technique that enables online optimization of portfolio allocation policies.
For log-normal assets, the strategy learned by Onflow, with transaction costs at zero, mimics Markowitz's optimal portfolio.
Onflow can remain efficient in regimes where other dynamical allocation techniques do not work anymore.
arXiv Detail & Related papers (2023-12-08T16:49:19Z) - Online non-parametric likelihood-ratio estimation by Pearson-divergence
functional minimization [55.98760097296213]
We introduce a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t sim p, x'_t sim q)$ are observed over time.
We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments.
arXiv Detail & Related papers (2023-11-03T13:20:11Z) - Autoregressive Bandits [58.46584210388307]
We propose a novel online learning setting, Autoregressive Bandits, in which the observed reward is governed by an autoregressive process of order $k$.
We show that, under mild assumptions on the reward process, the optimal policy can be conveniently computed.
We then devise a new optimistic regret minimization algorithm, namely, AutoRegressive Upper Confidence Bound (AR-UCB), that suffers sublinear regret of order $widetildemathcalO left( frac(k+1)3/2sqrtnT (1-G
arXiv Detail & Related papers (2022-12-12T21:37:36Z) - Instance-optimality in optimal value estimation: Adaptivity via
variance-reduced Q-learning [99.34907092347733]
We analyze the problem of estimating optimal $Q$-value functions for a discounted Markov decision process with discrete states and actions.
Using a local minimax framework, we show that this functional arises in lower bounds on the accuracy on any estimation procedure.
In the other direction, we establish the sharpness of our lower bounds, up to factors logarithmic in the state and action spaces, by analyzing a variance-reduced version of $Q$-learning.
arXiv Detail & Related papers (2021-06-28T00:38:54Z) - Can we imitate stock price behavior to reinforcement learn option price? [7.771514118651335]
This paper presents a framework of imitating the price behavior of the underlying stock for reinforcement learning option price.
We use accessible features of the equities pricing data to construct a non-deterministic Markov decision process.
Our algorithm then maps imitative principal investor's decisions to simulated stock price paths by a Bayesian deep neural network.
arXiv Detail & Related papers (2021-05-24T16:08:58Z) - Deep Reinforcement Learning for Stock Portfolio Optimization [0.0]
We will formulate the problem such that we can apply Reinforcement Learning for the task properly.
To maintain a realistic assumption about the market, we will incorporate transaction cost and risk factor into the state as well.
We will present the end-to-end solution for the task with Minimum Variance Portfolio for stock subset selection, and Wavelet Transform for extracting multi-frequency data pattern.
arXiv Detail & Related papers (2020-12-09T10:19:12Z) - Transfer Learning via $\ell_1$ Regularization [9.442139459221785]
We propose a method for transferring knowledge from a source domain to a target domain.
Our method yields sparsity for both the estimates themselves and changes of the estimates.
Empirical results demonstrate that the proposed method effectively balances stability and plasticity.
arXiv Detail & Related papers (2020-06-26T07:42:03Z) - Provably Efficient Safe Exploration via Primal-Dual Policy Optimization [105.7510838453122]
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation.
We present an provably efficient online policy optimization algorithm for CMDP with safe exploration in the function approximation setting.
arXiv Detail & Related papers (2020-03-01T17:47:03Z) - Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation [49.502277468627035]
This paper studies the statistical theory of batch data reinforcement learning with function approximation.
Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history.
arXiv Detail & Related papers (2020-02-21T19:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.