Related papers: Universal Trading for Order Execution with Oracle Policy Distillation

Universal Trading for Order Execution with Oracle Policy Distillation

URL: http://arxiv.org/abs/2103.10860v1
Date: Thu, 28 Jan 2021 05:52:18 GMT
Title: Universal Trading for Order Execution with Oracle Policy Distillation
Authors: Yuchen Fang, Kan Ren, Weiqing Liu, Dong Zhou, Weinan Zhang, Jiang Bian, Yong Yu, Tie-Yan Liu
Abstract summary: We propose a novel universal trading policy optimization framework to bridge the gap between the noisy yet imperfect market states and the optimal action sequences for order execution. We show that our framework can better guide the learning of the common policy towards practically optimal execution by an oracle teacher with perfect information.
Score: 99.57416828489568
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As a fundamental problem in algorithmic trading, order execution aims at fulfilling a specific trading order, either liquidation or acquirement, for a given instrument. Towards effective execution strategy, recent years have witnessed the shift from the analytical view with model-based market assumptions to model-free perspective, i.e., reinforcement learning, due to its nature of sequential decision optimization. However, the noisy and yet imperfect market information that can be leveraged by the policy has made it quite challenging to build up sample efficient reinforcement learning methods to achieve effective order execution. In this paper, we propose a novel universal trading policy optimization framework to bridge the gap between the noisy yet imperfect market states and the optimal action sequences for order execution. Particularly, this framework leverages a policy distillation method that can better guide the learning of the common policy towards practically optimal execution by an oracle teacher with perfect information to approximate the optimal trading strategy. The extensive experiments have shown significant improvements of our method over various strong baselines, with reasonable trading actions.

Related papers

Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models. Controlled Decoding provides a mechanism for aligning a model at inference time without retraining. We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z)
Deep Reinforcement Learning for Online Optimal Execution Strategies [49.1574468325115]
This paper tackles the challenge of learning non-Markovian optimal execution strategies in dynamic financial markets. We introduce a novel actor-critic algorithm based on Deep Deterministic Policy Gradient (DDPG) We show that our algorithm successfully approximates the optimal execution strategy.
arXiv Detail & Related papers (2024-10-17T12:38:08Z)
Deep Learning for Options Trading: An End-To-End Approach [7.148312060227716]
We introduce a novel approach to options trading strategies using a highly scalable and data-driven machine learning algorithm. We demonstrate that deep learning models trained according to our end-to-end approach exhibit significant improvements in risk-adjusted performance over existing rules-based trading strategies.
arXiv Detail & Related papers (2024-07-31T17:59:09Z)
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences. To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model. Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z)
Learning the Market: Sentiment-Based Ensemble Trading Agents [5.005352154557397]
We propose and study the integration of sentiment analysis and deep reinforcement learning ensemble algorithms for stock trading. We show that our approach results in a strategy that is profitable, robust, and risk-minimal.
arXiv Detail & Related papers (2024-02-02T14:34:22Z)
An Ensemble Method of Deep Reinforcement Learning for Automated Cryptocurrency Trading [16.78239969166596]
We propose an ensemble method to improve the generalization performance of trading strategies trained by deep reinforcement learning algorithms. Our proposed ensemble method improves the out-of-sample performance compared with the benchmarks of a deep reinforcement learning strategy and a passive investment strategy.
arXiv Detail & Related papers (2023-07-27T04:00:09Z)
Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning [64.10794426777493]
Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks. Recent practices tend to distill optimized action sequences into an RL policy during the training phase. We develop an approach to distill from model-based planning to the policy.
arXiv Detail & Related papers (2023-07-24T16:52:31Z)
Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance [96.73189436721465]
We first present a multi-agent RL (MARL) method for multi-order execution considering practical constraints. We propose a learnable multi-round communication protocol, for the agents communicating the intended actions with each other. Experiments on the data from two real-world markets have illustrated superior performance with significantly better collaboration effectiveness.
arXiv Detail & Related papers (2023-07-06T16:45:40Z)
Towards Generalizable Reinforcement Learning for Trade Execution [25.199192981742744]
Reinforcement learning (RL) has been applied to optimized trade execution to learn smarter policies from market data. We find that many existing RL methods exhibit considerable overfitting which prevents them from real deployment. We propose to learn compact representations for context to address the overfitting problem, either by leveraging prior knowledge or in an end-to-end manner.
arXiv Detail & Related papers (2023-05-12T02:41:11Z)
Off-Policy Imitation Learning from Observations [78.30794935265425]
Learning from Observations (LfO) is a practical reinforcement learning scenario from which many applications can benefit. We propose a sample-efficient LfO approach that enables off-policy optimization in a principled manner. Our approach is comparable with state-of-the-art locomotion in terms of both sample-efficiency and performance.
arXiv Detail & Related papers (2021-02-25T21:33:47Z)
Deep Deterministic Portfolio Optimization [0.0]
This work is to test reinforcement learning algorithms on conceptually simple, but mathematically non-trivial, trading environments. We study the deep deterministic policy gradient algorithm and show that such a reinforcement learning agent can successfully recover the essential features of the optimal trading strategies.
arXiv Detail & Related papers (2020-03-13T22:20:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.