Distributional Reinforcement Learning on Path-dependent Options
- URL: http://arxiv.org/abs/2507.12657v1
- Date: Wed, 16 Jul 2025 22:14:54 GMT
- Title: Distributional Reinforcement Learning on Path-dependent Options
- Authors: Ahmet Umur Özsoy,
- Abstract summary: We propose a framework for pricing path-dependent financial derivatives using Distributional Reinforcement Learning (DistRL)<n>Unlike traditional methods that focus on expected option value, our approach models the entire conditional distribution of payoffs.<n>We demonstrate the efficacy of this method on Asian options, using quantile-based value function approximators.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We reinterpret and propose a framework for pricing path-dependent financial derivatives by estimating the full distribution of payoffs using Distributional Reinforcement Learning (DistRL). Unlike traditional methods that focus on expected option value, our approach models the entire conditional distribution of payoffs, allowing for risk-aware pricing, tail-risk estimation, and enhanced uncertainty quantification. We demonstrate the efficacy of this method on Asian options, using quantile-based value function approximators.
Related papers
- Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.<n>We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.<n>Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning [11.304227281260896]
We introduce a novel strategy employing diverse randomized value functions to estimate the posterior distribution of $Q$-values.
It provides robust uncertainty quantification and estimates lower confidence bounds (LCB) of $Q$-values.
We also emphasize on diversity within randomized value functions and enhance efficiency by introducing a diversity regularization method, reducing the requisite number of networks.
arXiv Detail & Related papers (2024-04-09T10:15:18Z) - Uncertainty Quantification via Stable Distribution Propagation [60.065272548502]
We propose a new approach for propagating stable probability distributions through neural networks.
Our method is based on local linearization, which we show to be an optimal approximation in terms of total variation distance for the ReLU non-linearity.
arXiv Detail & Related papers (2024-02-13T09:40:19Z) - Distributional Counterfactual Explanations With Optimal Transport [7.597676579494146]
Counterfactual explanations (CE) are the de facto method for providing insights into black-box decision-making models.<n>This paper proposes distributional counterfactual explanation (DCE), shifting focus to the distributional properties of observed and counterfactual data.
arXiv Detail & Related papers (2024-01-23T21:48:52Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - An Offline Learning Approach to Propagator Models [3.1755820123640612]
We consider an offline learning problem for an agent who first estimates an unknown price impact kernel from a static dataset.
We propose a novel approach for a nonparametric estimation of the propagator from a dataset containing correlated price trajectories, trading signals and metaorders.
We show that a trader who tries to minimise her execution costs by using a greedy strategy purely based on the estimated propagator will encounter suboptimality.
arXiv Detail & Related papers (2023-09-06T13:36:43Z) - A Risk-Sensitive Approach to Policy Optimization [21.684251937825234]
Standard deep reinforcement learning (DRL) aims to maximize expected reward, considering collected experiences equally in formulating a policy.
We propose a more direct approach whereby risk-sensitive objectives, specified in terms of the cumulative distribution function (CDF) of the distribution of full-episode rewards, are optimized.
We demonstrate that the use of moderately "pessimistic" risk profiles, which emphasize scenarios where the agent performs poorly, leads to enhanced exploration and a continual focus on addressing deficiencies.
arXiv Detail & Related papers (2022-08-19T00:55:05Z) - Unifying Gradient Estimators for Meta-Reinforcement Learning via
Off-Policy Evaluation [53.83642844626703]
We provide a unifying framework for estimating higher-order derivatives of value functions, based on off-policy evaluation.
Our framework interprets a number of prior approaches as special cases and elucidates the bias and variance trade-off of Hessian estimates.
arXiv Detail & Related papers (2021-06-24T15:58:01Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z) - Off-Policy Evaluation via the Regularized Lagrangian [110.28927184857478]
Recently proposed distribution correction estimation (DICE) family of estimators has advanced the state of the art in off-policy evaluation from behavior-agnostic data.
In this paper, we unify these estimators as regularized Lagrangians of the same linear program.
We find that dual solutions offer greater flexibility in navigating the tradeoff between stability and estimation bias, and generally provide superior estimates in practice.
arXiv Detail & Related papers (2020-07-07T13:45:56Z) - Probabilistic multivariate electricity price forecasting using implicit
generative ensemble post-processing [0.0]
We use a likelihood-free implicit generative model based on an ensemble of point forecasting models to generate multivariate electricity price scenarios.
Our ensemble post-processing method outperforms well-established model combination benchmarks.
As our method works on top of an ensemble of domain-specific expert models, it can readily be deployed to other forecasting tasks.
arXiv Detail & Related papers (2020-05-27T15:22:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.