Related papers: Solving the optimal stopping problem with reinforcement learning: an application in financial option exercise

Solving the optimal stopping problem with reinforcement learning: an application in financial option exercise

URL: http://arxiv.org/abs/2208.00765v1
Date: Thu, 21 Jul 2022 22:52:05 GMT
Title: Solving the optimal stopping problem with reinforcement learning: an application in financial option exercise
Authors: Leonardo Kanashiro Felizardo and Elia Matsumoto and Emilio Del-Moral-Hernandez
Abstract summary: We employ a data-driven method that uses Monte Carlo simulation to train and test artificial neural networks. We propose a different architecture that uses convolutional neural networks (CNN) to deal with the dimensionality problem that arises when we transform the whole history of prices into a Markovian state.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The optimal stopping problem is a category of decision problems with a specific constrained configuration. It is relevant to various real-world applications such as finance and management. To solve the optimal stopping problem, state-of-the-art algorithms in dynamic programming, such as the least-squares Monte Carlo (LSMC), are employed. This type of algorithm relies on path simulations using only the last price of the underlying asset as a state representation. Also, the LSMC was thinking for option valuation where risk-neutral probabilities can be employed to account for uncertainty. However, the general optimal stopping problem goals may not fit the requirements of the LSMC showing auto-correlated prices. We employ a data-driven method that uses Monte Carlo simulation to train and test artificial neural networks (ANN) to solve the optimal stopping problem. Using ANN to solve decision problems is not entirely new. We propose a different architecture that uses convolutional neural networks (CNN) to deal with the dimensionality problem that arises when we transform the whole history of prices into a Markovian state. We present experiments that indicate that our proposed architecture improves results over the previous implementations under specific simulated time series function sets. Lastly, we employ our proposed method to compare the optimal exercise of the financial options problem with the LSMC algorithm. Our experiments show that our method can capture more accurate exercise opportunities when compared to the LSMC. We have outstandingly higher (above 974\% improvement) expected payoff from these exercise policies under the many Monte Carlo simulations that used the real-world return database on the out-of-sample (test) data.

Related papers

Sample, Don't Search: Rethinking Test-Time Alignment for Language Models [55.2480439325792]
We introduce QAlign, a new test-time alignment approach. As we scale test-time compute, QAlign converges to sampling from the optimal aligned distribution for each individual prompt. By adopting recent advances in Markov chain Monte Carlo for text generation, our method enables better-aligned outputs without modifying the underlying model or even requiring logit access.
arXiv Detail & Related papers (2025-04-04T00:41:40Z)
Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems. Such problems are encountered in medicine, physics, and machine learning. We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z)
Solving Stochastic Orienteering Problems with Chance Constraints Using a GNN Powered Monte Carlo Tree Search [3.3088495893219885]
We present a Monte Carlo Tree Search (MCTS) method to solve orienteering problems with chance constraints. While adhering to an assigned travel budget the algorithm seeks to maximize collected reward while incurring travel costs. We demonstrate how the approach is capable of generalizing beyond the characteristics of the training dataset.
arXiv Detail & Related papers (2024-09-06T23:31:01Z)
Pricing American Options using Machine Learning Algorithms [0.0]
This study investigates the application of machine learning algorithms to pricing American options using Monte Carlo simulations. Traditional models, such as the Black-Scholes-Merton framework, often fail to adequately address the complexities of American options. By leveraging Monte Carlo methods in conjunction with Least Square Method machine learning was used.
arXiv Detail & Related papers (2024-09-05T02:52:11Z)
Beyond Closure Models: Learning Chaotic-Systems via Physics-Informed Neural Operators [78.64101336150419]
Predicting the long-term behavior of chaotic systems is crucial for various applications such as climate modeling. An alternative approach to such a full-resolved simulation is using a coarse grid and then correcting its errors through a temporalittext model. We propose an alternative end-to-end learning approach using a physics-informed neural operator (PINO) that overcomes this limitation.
arXiv Detail & Related papers (2024-08-09T17:05:45Z)
Can Large Language Models Play Games? A Case Study of A Self-Play Approach [61.15761840203145]
Large Language Models (LLMs) harness extensive data from the Internet, storing a broad spectrum of prior knowledge. Monte-Carlo Tree Search (MCTS) is a search algorithm that provides reliable decision-making solutions. This work introduces an innovative approach that bolsters LLMs with MCTS self-play to efficiently resolve turn-based zero-sum games.
arXiv Detail & Related papers (2024-03-08T19:16:29Z)
Multi-Resolution Active Learning of Fourier Neural Operators [33.63483360957646]
We propose Multi-Resolution Active learning of FNO (MRA-FNO), which can dynamically select the input functions and resolutions to lower the data cost as much as possible. Specifically, we propose a probabilistic multi-resolution FNO and use ensemble Monte-Carlo to develop an effective posterior inference algorithm. We have shown the advantage of our method in several benchmark operator learning tasks.
arXiv Detail & Related papers (2023-09-29T04:41:27Z)
High-dimensional Contextual Bandit Problem without Sparsity [8.782204980889077]
We propose an explore-then-commit (EtC) algorithm to address this problem and examine its performance. We derive the optimal rate of the ETC algorithm in terms of $T$ and show that this rate can be achieved by balancing exploration and exploitation. We introduce an adaptive explore-then-commit (AEtC) algorithm that adaptively finds the optimal balance.
arXiv Detail & Related papers (2023-06-19T15:29:32Z)
Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space [0.0]
We study the problem of optimal control of a family of discrete-time countable state-space Markov Decision Processes. We propose an algorithm based on Thompson sampling with dynamically-sized episodes. We show that our algorithm can be applied to develop approximately optimal control algorithms.
arXiv Detail & Related papers (2023-06-05T03:57:16Z)
Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy. Many algorithms for IRL have an inherently nested structure. We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z)
An Experimental Design Perspective on Model-Based Reinforcement Learning [73.37942845983417]
In practical applications of RL, it is expensive to observe state transitions from the environment. We propose an acquisition function that quantifies how much information a state-action pair would provide about the optimal solution to a Markov decision process.
arXiv Detail & Related papers (2021-12-09T23:13:57Z)
Online Model Selection for Reinforcement Learning with Function Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret. We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.