Related papers: Deep Reinforcement Learning for Online Optimal Execution Strategies

Deep Reinforcement Learning for Online Optimal Execution Strategies

URL: http://arxiv.org/abs/2410.13493v1
Date: Thu, 17 Oct 2024 12:38:08 GMT
Title: Deep Reinforcement Learning for Online Optimal Execution Strategies
Authors: Alessandro Micheli, Mélodie Monod,
Abstract summary: This paper tackles the challenge of learning non-Markovian optimal execution strategies in dynamic financial markets. We introduce a novel actor-critic algorithm based on Deep Deterministic Policy Gradient (DDPG) We show that our algorithm successfully approximates the optimal execution strategy.
Score: 49.1574468325115
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper tackles the challenge of learning non-Markovian optimal execution strategies in dynamic financial markets. We introduce a novel actor-critic algorithm based on Deep Deterministic Policy Gradient (DDPG) to address this issue, with a focus on transient price impact modeled by a general decay kernel. Through numerical experiments with various decay kernels, we show that our algorithm successfully approximates the optimal execution strategy. Additionally, the proposed algorithm demonstrates adaptability to evolving market conditions, where parameters fluctuate over time. Our findings also show that modern reinforcement learning algorithms can provide a solution that reduces the need for frequent and inefficient human intervention in optimal execution tasks.

Related papers

Deep Reinforcement Learning Algorithms for Option Hedging [0.20482269513546458]
We compare the performance of eight Deep Reinforcement Learning (DRL) algorithms in the context of dynamic hedging. MCPG is the only algorithm to outperform the Black-Scholes delta hedge baseline with the allotted computational budget.
arXiv Detail & Related papers (2025-04-07T21:32:14Z)
RL-finetuning LLMs from on- and off-policy data with a single algorithm [53.70731390624718]
We introduce a novel reinforcement learning algorithm (AGRO) for fine-tuning large-language models. AGRO leverages the concept of generation consistency, which states that the optimal policy satisfies the notion of consistency across any possible generation of the model. We derive algorithms that find optimal solutions via the sample-based policy gradient and provide theoretical guarantees on their convergence.
arXiv Detail & Related papers (2025-03-25T12:52:38Z)
An accelerate Prediction Strategy for Dynamic Multi-Objective Optimization [7.272641346606365]
We introduce novel approaches for accelerating prediction strategies within the evolutionary algorithm framework. We propose an adaptive prediction strategy that incorporates second-order derivatives to predict and adjust the algorithms search behavior. We evaluate the performance of the proposed method against four state-of-the-art algorithms using standard DMOPs benchmark problems.
arXiv Detail & Related papers (2024-10-08T08:13:49Z)
Acceleration in Policy Optimization [50.323182853069184]
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates. We define optimism as predictive modelling of the future behavior of a policy, and adaptivity as taking immediate and anticipatory corrective actions to mitigate errors from overshooting predictions or delayed responses to change. We design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task.
arXiv Detail & Related papers (2023-06-18T15:50:57Z)
Reinforcement Learning Methods for Wordle: A POMDP/Adaptive Control Approach [0.3093890460224435]
We address the solution of the popular Wordle puzzle, using new reinforcement learning methods. For the Wordle puzzle, they yield on-line solution strategies that are very close to optimal at relatively modest computational cost.
arXiv Detail & Related papers (2022-11-15T03:46:41Z)
High-dimensional Bayesian Optimization Algorithm with Recurrent Neural Network for Disease Control Models in Time Series [1.9371782627708491]
We propose a new high dimensional Bayesian Optimization algorithm combining Recurrent neural networks. The proposed RNN-BO algorithm can solve the optimal control problems in the lower dimension space. We also discuss the impacts of different numbers of the RNN layers and training epochs on the trade-off between solution quality and related computational efforts.
arXiv Detail & Related papers (2022-01-01T08:40:17Z)
PAMELI: A Meta-Algorithm for Computationally Expensive Multi-Objective Optimization Problems [0.0]
The proposed algorithm is based on solving a set of surrogate problems defined by models of the real one. Our algorithm also performs a meta-search for optimal surrogate models and navigation strategies for the optimization landscape.
arXiv Detail & Related papers (2021-03-19T11:18:03Z)
Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs. The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z)
Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning. Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z)
Mixed Strategies for Robust Optimization of Unknown Objectives [93.8672371143881]
We consider robust optimization problems, where the goal is to optimize an unknown objective function against the worst-case realization of an uncertain parameter. We design a novel sample-efficient algorithm GP-MRO, which sequentially learns about the unknown objective from noisy point evaluations. GP-MRO seeks to discover a robust and randomized mixed strategy, that maximizes the worst-case expected objective value.
arXiv Detail & Related papers (2020-02-28T09:28:17Z)
Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory. Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.