Deep Reinforcement Learning for Optimal Stopping with Application in
Financial Engineering
- URL: http://arxiv.org/abs/2105.08877v1
- Date: Wed, 19 May 2021 01:52:04 GMT
- Title: Deep Reinforcement Learning for Optimal Stopping with Application in
Financial Engineering
- Authors: Abderrahim Fathan and Erick Delage
- Abstract summary: We employ deep Reinforcement Learning to learn optimal stopping policies in two financial engineering applications: option pricing, and optimal option exercise.
We present for the first time a comprehensive empirical evaluation of the quality of optimal stopping policies identified by three state of the art deep RL algorithms.
- Score: 1.52292571922932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optimal stopping is the problem of deciding the right time at which to take a
particular action in a stochastic system, in order to maximize an expected
reward. It has many applications in areas such as finance, healthcare, and
statistics. In this paper, we employ deep Reinforcement Learning (RL) to learn
optimal stopping policies in two financial engineering applications: namely
option pricing, and optimal option exercise. We present for the first time a
comprehensive empirical evaluation of the quality of optimal stopping policies
identified by three state of the art deep RL algorithms: double deep Q-learning
(DDQN), categorical distributional RL (C51), and Implicit Quantile Networks
(IQN). In the case of option pricing, our findings indicate that in a
theoretical Black-Schole environment, IQN successfully identifies nearly
optimal prices. On the other hand, it is slightly outperformed by C51 when
confronted to real stock data movements in a put option exercise problem that
involves assets from the S&P500 index. More importantly, the C51 algorithm is
able to identify an optimal stopping policy that achieves 8% more out-of-sample
returns than the best of four natural benchmark policies. We conclude with a
discussion of our findings which should pave the way for relevant future
research.
Related papers
- Traversing Pareto Optimal Policies: Provably Efficient Multi-Objective Reinforcement Learning [14.260168974085376]
This paper investigates multi-objective reinforcement learning (MORL)
It focuses on learning optimal policies in the presence of multiple reward functions.
Despite MORL's success, there is still a lack of satisfactory understanding of various MORL optimization targets and efficient learning algorithms.
arXiv Detail & Related papers (2024-07-24T17:58:49Z) - Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning [55.65738319966385]
We propose a novel online algorithm, iterative Nash policy optimization (INPO)
Unlike previous methods, INPO bypasses the need for estimating the expected win rate for individual responses.
With an LLaMA-3-8B-based SFT model, INPO achieves a 42.6% length-controlled win rate on AlpacaEval 2.0 and a 37.8% win rate on Arena-Hard.
arXiv Detail & Related papers (2024-06-30T08:00:34Z) - Extreme Q-Learning: MaxEnt RL without Entropy [88.97516083146371]
Modern Deep Reinforcement Learning (RL) algorithms require estimates of the maximal Q-value, which are difficult to compute in continuous domains.
We introduce a new update rule for online and offline RL which directly models the maximal value using Extreme Value Theory (EVT)
Using EVT, we derive our Extreme Q-Learning framework and consequently online and, for the first time, offline MaxEnt Q-learning algorithms.
arXiv Detail & Related papers (2023-01-05T23:14:38Z) - Solving the optimal stopping problem with reinforcement learning: an
application in financial option exercise [0.0]
We employ a data-driven method that uses Monte Carlo simulation to train and test artificial neural networks.
We propose a different architecture that uses convolutional neural networks (CNN) to deal with the dimensionality problem that arises when we transform the whole history of prices into a Markovian state.
arXiv Detail & Related papers (2022-07-21T22:52:05Z) - Human-in-the-loop: Provably Efficient Preference-based Reinforcement
Learning with General Function Approximation [107.54516740713969]
We study human-in-the-loop reinforcement learning (RL) with trajectory preferences.
Instead of receiving a numeric reward at each step, the agent only receives preferences over trajectory pairs from a human overseer.
We propose the first optimistic model-based algorithm for PbRL with general function approximation.
arXiv Detail & Related papers (2022-05-23T09:03:24Z) - Restless Bandits with Many Arms: Beating the Central Limit Theorem [25.639496138046546]
finite-horizon restless bandits with multiple pulls per period play an important role in recommender systems, active learning, revenue management, and many other areas.
While an optimal policy can be computed, in principle, using dynamic programming, the computation required scales exponentially in the number of arms $N$.
We characterize a non-degeneracy condition and a class of novel practically-computable policies, called fluid-priority policies, in which the optimality gap is $O(1)$.
arXiv Detail & Related papers (2021-07-25T23:27:12Z) - Deep Reinforcement Learning for Stock Portfolio Optimization [0.0]
We will formulate the problem such that we can apply Reinforcement Learning for the task properly.
To maintain a realistic assumption about the market, we will incorporate transaction cost and risk factor into the state as well.
We will present the end-to-end solution for the task with Minimum Variance Portfolio for stock subset selection, and Wavelet Transform for extracting multi-frequency data pattern.
arXiv Detail & Related papers (2020-12-09T10:19:12Z) - Provably Good Batch Reinforcement Learning Without Great Exploration [51.51462608429621]
Batch reinforcement learning (RL) is important to apply RL algorithms to many high stakes tasks.
Recent algorithms have shown promise but can still be overly optimistic in their expected outcomes.
We show that a small modification to Bellman optimality and evaluation back-up to take a more conservative update can have much stronger guarantees.
arXiv Detail & Related papers (2020-07-16T09:25:54Z) - Preference-based Reinforcement Learning with Finite-Time Guarantees [76.88632321436472]
Preference-based Reinforcement Learning (PbRL) replaces reward values in traditional reinforcement learning to better elicit human opinion on the target objective.
Despite promising results in applications, the theoretical understanding of PbRL is still in its infancy.
We present the first finite-time analysis for general PbRL problems.
arXiv Detail & Related papers (2020-06-16T03:52:41Z) - Provably Efficient Exploration in Policy Optimization [117.09887790160406]
This paper proposes an Optimistic variant of the Proximal Policy Optimization algorithm (OPPO)
OPPO achieves $tildeO(sqrtd2 H3 T )$ regret.
To the best of our knowledge, OPPO is the first provably efficient policy optimization algorithm that explores.
arXiv Detail & Related papers (2019-12-12T08:40:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.