Hierarchical Deep Reinforcement Learning for VWAP Strategy Optimization
- URL: http://arxiv.org/abs/2212.14670v1
- Date: Sun, 11 Dec 2022 07:35:26 GMT
- Title: Hierarchical Deep Reinforcement Learning for VWAP Strategy Optimization
- Authors: Xiaodong Li, Pangjing Wu, Chenxin Zou, Qing Li
- Abstract summary: We propose a deep learning and hierarchical reinforcement learning architecture to capture market patterns and execute orders from different temporal scales.
Our approach outperforms baselines in terms of VWAP slippage, with an average cost saving of 1.16 base points compared to the optimal baseline.
- Score: 9.430129571478629
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Designing an intelligent volume-weighted average price (VWAP) strategy is a
critical concern for brokers, since traditional rule-based strategies are
relatively static that cannot achieve a lower transaction cost in a dynamic
market. Many studies have tried to minimize the cost via reinforcement
learning, but there are bottlenecks in improvement, especially for
long-duration strategies such as the VWAP strategy. To address this issue, we
propose a deep learning and hierarchical reinforcement learning jointed
architecture termed Macro-Meta-Micro Trader (M3T) to capture market patterns
and execute orders from different temporal scales. The Macro Trader first
allocates a parent order into tranches based on volume profiles as the
traditional VWAP strategy does, but a long short-term memory neural network is
used to improve the forecasting accuracy. Then the Meta Trader selects a
short-term subgoal appropriate to instant liquidity within each tranche to form
a mini-tranche. The Micro Trader consequently extracts the instant market state
and fulfils the subgoal with the lowest transaction cost. Our experiments over
stocks listed on the Shanghai stock exchange demonstrate that our approach
outperforms baselines in terms of VWAP slippage, with an average cost saving of
1.16 base points compared to the optimal baseline.
Related papers
- Deep Learning for VWAP Execution in Crypto Markets: Beyond the Volume Curve [0.0]
Volume-Weighted Average Price (VWAP) is arguably the most prevalent benchmark for trade execution.
achieving VWAP is inherently challenging due to its dependence on two dynamic factors, volumes and prices.
I propose a deep learning framework that directly optimize the VWAP execution objective by bypassing the intermediate step of volume curve prediction.
arXiv Detail & Related papers (2025-02-19T13:49:51Z) - Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis [89.60263788590893]
Post-training Quantization (PTQ) technique has been extensively adopted for large language models (LLMs) compression.
Existing algorithms focus primarily on performance, overlooking the trade-off among model size, performance, and quantization bitwidth.
arXiv Detail & Related papers (2025-02-18T07:35:35Z) - Universal Model Routing for Efficient LLM Inference [72.65083061619752]
We consider the problem of dynamic routing, where new, previously unobserved LLMs are available at test time.
We propose a new approach to this problem that relies on representing each LLM as a feature vector, derived based on predictions on a set of representative prompts.
We prove that these strategies are estimates of a theoretically optimal routing rule, and provide an excess risk bound to quantify their errors.
arXiv Detail & Related papers (2025-02-12T20:30:28Z) - Regret-Optimized Portfolio Enhancement through Deep Reinforcement Learning and Future Looking Rewards [3.9795751586546766]
This paper introduces a novel agent-based approach for enhancing existing portfolio strategies using Proximal Policy Optimization (PPO)
Rather than focusing solely on traditional portfolio construction, our approach aims to improve an already high-performing strategy through dynamic rebalancing driven by PPO and Oracle agents.
arXiv Detail & Related papers (2025-02-04T11:45:59Z) - The N-Grammys: Accelerating Autoregressive Inference with Learning-Free Batched Speculation [48.52206677611072]
Speculative decoding aims to speed up autoregressive generation of a language model by verifying in parallel the tokens generated by a smaller draft model.
We show that combinations of simple strategies can achieve significant inference speedups over different tasks.
arXiv Detail & Related papers (2024-11-06T09:23:50Z) - VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment [66.80143024475635]
We propose VinePPO, a straightforward approach to compute unbiased Monte Carlo-based estimates.
We show that VinePPO consistently outperforms PPO and other RL-free baselines across MATH and GSM8K datasets.
arXiv Detail & Related papers (2024-10-02T15:49:30Z) - Intelligent Systematic Investment Agent: an ensemble of deep learning
and evolutionary strategies [0.0]
Our paper proposes a new approach for developing long-term investment strategies using an ensemble of evolutionary algorithms and a deep learning model.
Our methodology focuses on building long-term wealth by improving systematic investment planning (SIP) decisions on Exchange Traded Funds (ETF) over a period of time.
arXiv Detail & Related papers (2022-03-24T15:39:05Z) - Bitcoin Transaction Strategy Construction Based on Deep Reinforcement
Learning [8.431365407963629]
This study proposes a framework for automatic high-frequency bitcoin transactions based on a deep reinforcement learning algorithm-proximal policy optimization (PPO)
The proposed framework can earn excess returns through both the period of volatility and surge, which opens the door to research on building a single cryptocurrency trading strategy based on deep learning.
arXiv Detail & Related papers (2021-09-30T01:24:03Z) - Universal Trading for Order Execution with Oracle Policy Distillation [99.57416828489568]
We propose a novel universal trading policy optimization framework to bridge the gap between the noisy yet imperfect market states and the optimal action sequences for order execution.
We show that our framework can better guide the learning of the common policy towards practically optimal execution by an oracle teacher with perfect information.
arXiv Detail & Related papers (2021-01-28T05:52:18Z) - Deep Stock Trading: A Hierarchical Reinforcement Learning Framework for
Portfolio Optimization and Order Execution [26.698261314897195]
We propose a hierarchical reinforced stock trading system for portfolio management (HRPM)
We decompose the trading process into a hierarchy of portfolio management over trade execution and train the corresponding policies.
HRPM achieves significant improvement against many state-of-the-art approaches.
arXiv Detail & Related papers (2020-12-23T12:09:26Z) - Deep Stock Predictions [58.720142291102135]
We consider the design of a trading strategy that performs portfolio optimization using Long Short Term Memory (LSTM) neural networks.
We then customize the loss function used to train the LSTM to increase the profit earned.
We find the LSTM model with the customized loss function to have an improved performance in the training bot over a regressive baseline such as ARIMA.
arXiv Detail & Related papers (2020-06-08T23:37:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.