Related papers: Hedging using reinforcement learning: Contextual $k$-Armed Bandit versus $Q$-learning

Hedging using reinforcement learning: Contextual $k$-Armed Bandit versus $Q$-learning

URL: http://arxiv.org/abs/2007.01623v2
Date: Sun, 6 Feb 2022 18:49:39 GMT
Title: Hedging using reinforcement learning: Contextual $k$-Armed Bandit versus $Q$-learning
Authors: Loris Cannelli, Giuseppe Nuti, Marzio Sala, Oleg Szehr
Abstract summary: We study the construction of replication strategies for contingent claims in the presence of risk and market friction. In this article, the hedging problem is viewed as an instance of a risk-averse contextual $k$-armed bandit problem. We find that the $k$-armed bandit model naturally fits to the Profit and Loss formulation of hedging.
Score: 0.22940141855172028
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The construction of replication strategies for contingent claims in the presence of risk and market friction is a key problem of financial engineering. In real markets, continuous replication, such as in the model of Black, Scholes and Merton (BSM), is not only unrealistic but it is also undesirable due to high transaction costs. A variety of methods have been proposed to balance between effective replication and losses in the incomplete market setting. With the rise of Artificial Intelligence (AI), AI-based hedgers have attracted considerable interest, where particular attention was given to Recurrent Neural Network systems and variations of the $Q$-learning algorithm. From a practical point of view, sufficient samples for training such an AI can only be obtained from a simulator of the market environment. Yet if an agent was trained solely on simulated data, the run-time performance will primarily reflect the accuracy of the simulation, which leads to the classical problem of model choice and calibration. In this article, the hedging problem is viewed as an instance of a risk-averse contextual $k$-armed bandit problem, which is motivated by the simplicity and sample-efficiency of the architecture. This allows for realistic online model updates from real-world data. We find that the $k$-armed bandit model naturally fits to the Profit and Loss formulation of hedging, providing for a more accurate and sample efficient approach than $Q$-learning and reducing to the Black-Scholes model in the absence of transaction costs and risks.

Related papers

MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models [56.09354775405601]
Model extraction attacks aim to replicate the functionality of a black-box model through query access.<n>Most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs.<n>We propose MISLEADER, a novel defense strategy that does not rely on OOD assumptions.
arXiv Detail & Related papers (2025-06-03T01:37:09Z)
Evaluating Supervised Learning Models for Fraud Detection: A Comparative Study of Classical and Deep Architectures on Imbalanced Transaction Data [2.5670390559986442]
Fraud detection remains a critical task in high-stakes domains such as finance and e-commerce.<n>We systematically compare the performance of four supervised learning models on a large-scale, highly imbalanced online transaction dataset.
arXiv Detail & Related papers (2025-05-28T16:08:04Z)
TRADES: Generating Realistic Market Simulations with Diffusion Models [4.308104021015939]
Financial markets are complex systems characterized by high statistical noise, nonlinearity, and constant evolution. We address the task of generating realistic and responsive Limit Order Book (LOB) market simulations. We propose a novel Denoising Diffusion Probabilistic Engine for LOB Simulations (TRADES)
arXiv Detail & Related papers (2025-01-31T19:43:13Z)
MetaTrading: An Immersion-Aware Model Trading Framework for Vehicular Metaverse Services [94.61039892220037]
We present a novel immersion-aware model trading framework that incentivizes metaverse users (MUs) to contribute learning models for augmented reality (AR) services in the vehicular metaverse. Considering dynamic network conditions and privacy concerns, we formulate the reward decisions of MSPs as a multi-agent Markov decision process. Experimental results demonstrate that the proposed framework can effectively provide higher-value models for object detection and classification in AR services on real AR-related vehicle datasets.
arXiv Detail & Related papers (2024-10-25T16:20:46Z)
Online Resource Allocation for Edge Intelligence with Colocated Model Retraining and Inference [5.6679198251041765]
We introduce an online approximation algorithm, named ORRIC, designed to optimize resource allocation for adaptively balancing accuracy of training model and inference. The competitive ratio of ORRIC outperforms that of the traditional In-ference-Only paradigm, especially when data persists for a sufficiently lengthy time.
arXiv Detail & Related papers (2024-05-25T03:05:19Z)
Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios. Existing debiasing methods suffer from high costs in bias labeling or model re-training. We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z)
Designing an attack-defense game: how to increase robustness of financial transaction models via a competition [69.08339915577206]
Given the escalating risks of malicious attacks in the finance sector, understanding adversarial strategies and robust defense mechanisms for machine learning models is critical. We aim to investigate the current state and dynamics of adversarial attacks and defenses for neural network models that use sequential financial data as the input. We have designed a competition that allows realistic and detailed investigation of problems in modern financial transaction data. The participants compete directly against each other, so possible attacks and defenses are examined in close-to-real-life conditions.
arXiv Detail & Related papers (2023-08-22T12:53:09Z)
Adversarial Deep Hedging: Learning to Hedge without Price Process Modeling [4.656182369206814]
We propose a new framework called adversarial deep hedging, inspired by adversarial learning. In this framework, a hedger and a generator, which respectively model the underlying asset process and the underlying asset process, are trained in an adversarial manner.
arXiv Detail & Related papers (2023-07-25T03:09:32Z)
Anytime Model Selection in Linear Bandits [61.97047189786905]
We develop ALEXP, which has an exponentially improved dependence on $M$ for its regret. Our approach utilizes a novel time-uniform analysis of the Lasso, establishing a new connection between online learning and high-dimensional statistics.
arXiv Detail & Related papers (2023-07-24T15:44:30Z)
Neural Stochastic Agent-Based Limit Order Book Simulation: A Hybrid Methodology [6.09170287691728]
Modern financial exchanges use an electronic limit order book (LOB) to store bid and ask orders for a specific financial asset. We propose a novel hybrid LOB simulation paradigm characterised by: (1) representing the aggregation of market events' logic by a neural background trader that is pre-trained on historical LOB data through a neural point model; and (2) embedding the background trader in a multi-agent simulation with other trading agents. We show that the stylised facts remain and we demonstrate order flow impact and financial herding behaviours that are in accordance with empirical observations of real markets.
arXiv Detail & Related papers (2023-02-28T20:53:39Z)
Learning to simulate realistic limit order book markets from data as a World Agent [1.1470070927586016]
Multi-agent market simulators usually require careful calibration to emulate real markets. Poorly calibrated simulators can lead to misleading conclusions. We propose a world model simulator that accurately emulates a limit order book market.
arXiv Detail & Related papers (2022-09-26T09:17:11Z)
Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution. This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes. Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z)
Model-Augmented Q-learning [112.86795579978802]
We propose a MFRL framework that is augmented with the components of model-based RL. Specifically, we propose to estimate not only the $Q$-values but also both the transition and the reward with a shared network. We show that the proposed scheme, called Model-augmented $Q$-learning (MQL), obtains a policy-invariant solution which is identical to the solution obtained by learning with true reward.
arXiv Detail & Related papers (2021-02-07T17:56:50Z)
Robust pricing and hedging via neural SDEs [0.0]
We develop and analyse novel algorithms needed for efficient use of neural SDEs. We find robust bounds for prices of derivatives and the corresponding hedging strategies while incorporating relevant market data. Neural SDEs allow consistent calibration under both the risk-neutral and the real-world measures.
arXiv Detail & Related papers (2020-07-08T14:33:17Z)
Uncertainty-Aware Consistency Regularization for Cross-Domain Semantic Segmentation [63.75774438196315]
Unsupervised domain adaptation (UDA) aims to adapt existing models of the source domain to a new target domain with only unlabeled data. Most existing methods suffer from noticeable negative transfer resulting from either the error-prone discriminator network or the unreasonable teacher model. We propose an uncertainty-aware consistency regularization method for cross-domain semantic segmentation.
arXiv Detail & Related papers (2020-04-19T15:30:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.