Related papers: Solving The Dynamic Volatility Fitting Problem: A Deep Reinforcement Learning Approach

Solving The Dynamic Volatility Fitting Problem: A Deep Reinforcement Learning Approach

URL: http://arxiv.org/abs/2410.11789v1
Date: Tue, 15 Oct 2024 17:10:54 GMT
Title: Solving The Dynamic Volatility Fitting Problem: A Deep Reinforcement Learning Approach
Authors: Emmanuel Gnabeyeu, Omar Karkar, Imad Idboufous,
Abstract summary: We show that variants of Deep Deterministic Policy Gradient (DDPG) and Soft Actor Critic (SAC) can achieve at least as good as standard fitting algorithms. We explain why the reinforcement learning framework is appropriate to handle complex objective functions.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The volatility fitting is one of the core problems in the equity derivatives business. Through a set of deterministic rules, the degrees of freedom in the implied volatility surface encoding (parametrization, density, diffusion) are defined. Whilst very effective, this approach widespread in the industry is not natively tailored to learn from shifts in market regimes and discover unsuspected optimal behaviors. In this paper, we change the classical paradigm and apply the latest advances in Deep Reinforcement Learning(DRL) to solve the fitting problem. In particular, we show that variants of Deep Deterministic Policy Gradient (DDPG) and Soft Actor Critic (SAC) can achieve at least as good as standard fitting algorithms. Furthermore, we explain why the reinforcement learning framework is appropriate to handle complex objective functions and is natively adapted for online learning.

Related papers

Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence [131.41894248194995]
We propose context-oriented decomposition adaptation (CorDA), a novel method that initializes adapters in a task-aware manner.<n>Thanks to the task awareness, our method enables two optional adaptation modes, knowledge-preserved mode (KPM) and instruction-previewed mode (IPM)
arXiv Detail & Related papers (2025-06-16T07:55:14Z)
On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning [50.856589224454055]
Policy gradient algorithms have been successfully applied to enhance the reasoning capabilities of large language models (LLMs)<n>We propose regularized policy gradient (RPG), a framework for deriving and analyzing KL-regularized policy gradient methods in the online reinforcement learning setting.<n>RPG shows improved or competitive results in terms of training stability and performance compared to strong baselines such as GRPO, REINFORCE++, and DAPO.
arXiv Detail & Related papers (2025-05-23T06:01:21Z)
Causally Aligned Curriculum Learning [69.11672390876763]
This paper studies the problem of curriculum RL through causal lenses. We derive a sufficient graphical condition characterizing causally aligned source tasks. We develop an efficient algorithm to generate a causally aligned curriculum.
arXiv Detail & Related papers (2025-03-21T02:20:38Z)
Alpha-VI DeepONet: A prior-robust variational Bayesian approach for enhancing DeepONets with uncertainty quantification [0.0]
We introduce a novel deep operator network (DeepONet) framework that incorporates generalised variational inference (GVI) By incorporating Bayesian neural networks as the building blocks for the branch and trunk networks, our framework endows DeepONet with uncertainty quantification. We demonstrate that modifying the variational objective function yields superior results in terms of minimising the mean squared error.
arXiv Detail & Related papers (2024-08-01T16:22:03Z)
Reinforcement Learning for Adaptive MCMC [6.773499165024668]
The aim of this paper is to set out a general framework, called Reinforcement Learning Metropolis--Hastings. Control of the learning rate provably ensures conditions for ergodicity are satisfied. The methodology is used to construct a gradient-free sampler that out-performs a popular gradient-free adaptive Metropolis--Hastings algorithm.
arXiv Detail & Related papers (2024-05-22T12:11:12Z)
Online Nonstochastic Model-Free Reinforcement Learning [35.377261344335736]
We investigate robust model robustness guarantees for environments that may be dynamic or adversarial. We provide efficient and efficient algorithms for optimizing these policies. These are the best-known developments in having no dependence on the state-space dimension in having no dependence on the state-space.
arXiv Detail & Related papers (2023-05-27T19:02:55Z)
Reinforcement Learning from Diverse Human Preferences [68.4294547285359]
This paper develops a method for crowd-sourcing preference labels and learning from diverse human preferences. The proposed method is tested on a variety of tasks in DMcontrol and Meta-world. It has shown consistent and significant improvements over existing preference-based RL algorithms when learning from diverse feedback.
arXiv Detail & Related papers (2023-01-27T15:18:54Z)
Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm [4.128216503196621]
We propose an On-policy Model-based Safe Deep RL algorithm in which we learn the transition dynamics of the environment in an online manner. We show that our algorithm is more sample efficient and results in lower cumulative hazard violations as compared to constrained model-free approaches.
arXiv Detail & Related papers (2022-10-14T06:53:02Z)
Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL. We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z)
Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem. We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z)
Meta-Learning with Neural Tangent Kernels [58.06951624702086]
We propose the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model's Neural Tangent Kernel (NTK) Within this paradigm, we introduce two meta-learning algorithms, which no longer need a sub-optimal iterative inner-loop adaptation as in the MAML framework. We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
arXiv Detail & Related papers (2021-02-07T20:53:23Z)
Learning Off-Policy with Online Planning [18.63424441772675]
We investigate a novel instantiation of H-step lookahead with a learned model and a terminal value function. We show the flexibility of LOOP to incorporate safety constraints during deployment with a set of navigation environments.
arXiv Detail & Related papers (2020-08-23T16:18:44Z)
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment. One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting. But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient. We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z)
Deep Reinforcement Learning with Robust and Smooth Policy [90.78795857181727]
We propose to learn a smooth policy that behaves smoothly with respect to states. We develop a new framework -- textbfSmooth textbfRegularized textbfReinforcement textbfLearning ($textbfSR2textbfL$), where the policy is trained with smoothness-inducing regularization. Such regularization effectively constrains the search space, and enforces smoothness in the learned policy.
arXiv Detail & Related papers (2020-03-21T00:10:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.