Universal Trading for Order Execution with Oracle Policy Distillation
- URL: http://arxiv.org/abs/2103.10860v1
- Date: Thu, 28 Jan 2021 05:52:18 GMT
- Title: Universal Trading for Order Execution with Oracle Policy Distillation
- Authors: Yuchen Fang, Kan Ren, Weiqing Liu, Dong Zhou, Weinan Zhang, Jiang
Bian, Yong Yu, Tie-Yan Liu
- Abstract summary: We propose a novel universal trading policy optimization framework to bridge the gap between the noisy yet imperfect market states and the optimal action sequences for order execution.
We show that our framework can better guide the learning of the common policy towards practically optimal execution by an oracle teacher with perfect information.
- Score: 99.57416828489568
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a fundamental problem in algorithmic trading, order execution aims at
fulfilling a specific trading order, either liquidation or acquirement, for a
given instrument. Towards effective execution strategy, recent years have
witnessed the shift from the analytical view with model-based market
assumptions to model-free perspective, i.e., reinforcement learning, due to its
nature of sequential decision optimization. However, the noisy and yet
imperfect market information that can be leveraged by the policy has made it
quite challenging to build up sample efficient reinforcement learning methods
to achieve effective order execution. In this paper, we propose a novel
universal trading policy optimization framework to bridge the gap between the
noisy yet imperfect market states and the optimal action sequences for order
execution. Particularly, this framework leverages a policy distillation method
that can better guide the learning of the common policy towards practically
optimal execution by an oracle teacher with perfect information to approximate
the optimal trading strategy. The extensive experiments have shown significant
improvements of our method over various strong baselines, with reasonable
trading actions.
Related papers
- Deep Reinforcement Learning for Online Optimal Execution Strategies [49.1574468325115]
This paper tackles the challenge of learning non-Markovian optimal execution strategies in dynamic financial markets.
We introduce a novel actor-critic algorithm based on Deep Deterministic Policy Gradient (DDPG)
We show that our algorithm successfully approximates the optimal execution strategy.
arXiv Detail & Related papers (2024-10-17T12:38:08Z) - Deep Learning for Options Trading: An End-To-End Approach [7.148312060227716]
We introduce a novel approach to options trading strategies using a highly scalable and data-driven machine learning algorithm.
We demonstrate that deep learning models trained according to our end-to-end approach exhibit significant improvements in risk-adjusted performance over existing rules-based trading strategies.
arXiv Detail & Related papers (2024-07-31T17:59:09Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.
To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.
Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Learning the Market: Sentiment-Based Ensemble Trading Agents [5.193582840789407]
We propose the integration of sentiment analysis and deep-reinforcement learning ensemble algorithms for stock trading.
We show that our approach results in a strategy that is profitable, robust, and risk-minimal.
arXiv Detail & Related papers (2024-02-02T14:34:22Z) - An Ensemble Method of Deep Reinforcement Learning for Automated
Cryptocurrency Trading [16.78239969166596]
We propose an ensemble method to improve the generalization performance of trading strategies trained by deep reinforcement learning algorithms.
Our proposed ensemble method improves the out-of-sample performance compared with the benchmarks of a deep reinforcement learning strategy and a passive investment strategy.
arXiv Detail & Related papers (2023-07-27T04:00:09Z) - Theoretically Guaranteed Policy Improvement Distilled from Model-Based
Planning [64.10794426777493]
Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks.
Recent practices tend to distill optimized action sequences into an RL policy during the training phase.
We develop an approach to distill from model-based planning to the policy.
arXiv Detail & Related papers (2023-07-24T16:52:31Z) - Learning Multi-Agent Intention-Aware Communication for Optimal
Multi-Order Execution in Finance [96.73189436721465]
We first present a multi-agent RL (MARL) method for multi-order execution considering practical constraints.
We propose a learnable multi-round communication protocol, for the agents communicating the intended actions with each other.
Experiments on the data from two real-world markets have illustrated superior performance with significantly better collaboration effectiveness.
arXiv Detail & Related papers (2023-07-06T16:45:40Z) - Towards Generalizable Reinforcement Learning for Trade Execution [25.199192981742744]
Reinforcement learning (RL) has been applied to optimized trade execution to learn smarter policies from market data.
We find that many existing RL methods exhibit considerable overfitting which prevents them from real deployment.
We propose to learn compact representations for context to address the overfitting problem, either by leveraging prior knowledge or in an end-to-end manner.
arXiv Detail & Related papers (2023-05-12T02:41:11Z) - Off-Policy Imitation Learning from Observations [78.30794935265425]
Learning from Observations (LfO) is a practical reinforcement learning scenario from which many applications can benefit.
We propose a sample-efficient LfO approach that enables off-policy optimization in a principled manner.
Our approach is comparable with state-of-the-art locomotion in terms of both sample-efficiency and performance.
arXiv Detail & Related papers (2021-02-25T21:33:47Z) - Deep Deterministic Portfolio Optimization [0.0]
This work is to test reinforcement learning algorithms on conceptually simple, but mathematically non-trivial, trading environments.
We study the deep deterministic policy gradient algorithm and show that such a reinforcement learning agent can successfully recover the essential features of the optimal trading strategies.
arXiv Detail & Related papers (2020-03-13T22:20:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.