Learn Continuously, Act Discretely: Hybrid Action-Space Reinforcement
Learning For Optimal Execution
- URL: http://arxiv.org/abs/2207.11152v1
- Date: Fri, 22 Jul 2022 15:50:44 GMT
- Title: Learn Continuously, Act Discretely: Hybrid Action-Space Reinforcement
Learning For Optimal Execution
- Authors: Feiyang Pan, Tongzhe Zhang, Ling Luo, Jia He, Shuoling Liu
- Abstract summary: Reinforcement learning can help decide the order-splitting sizes.
Key challenge lies in the "continuous-discrete duality" of the action space.
We propose a hybrid RL method to combine the advantages of both.
- Score: 8.021077964915996
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Optimal execution is a sequential decision-making problem for cost-saving in
algorithmic trading. Studies have found that reinforcement learning (RL) can
help decide the order-splitting sizes. However, a problem remains unsolved: how
to place limit orders at appropriate limit prices? The key challenge lies in
the "continuous-discrete duality" of the action space. On the one hand, the
continuous action space using percentage changes in prices is preferred for
generalization. On the other hand, the trader eventually needs to choose limit
prices discretely due to the existence of the tick size, which requires
specialization for every single stock with different characteristics (e.g., the
liquidity and the price range). So we need continuous control for
generalization and discrete control for specialization. To this end, we propose
a hybrid RL method to combine the advantages of both of them. We first use a
continuous control agent to scope an action subset, then deploy a fine-grained
agent to choose a specific limit price. Extensive experiments show that our
method has higher sample efficiency and better training stability than existing
RL algorithms and significantly outperforms previous learning-based methods for
order execution.
Related papers
- Double-Bounded Optimal Transport for Advanced Clustering and
Classification [58.237576976486544]
We propose Doubly Bounded Optimal Transport (DB-OT), which assumes that the target distribution is restricted within two boundaries instead of a fixed one.
We show that our method can achieve good results with our improved inference scheme in the testing stage.
arXiv Detail & Related papers (2024-01-21T07:43:01Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Actor-Critic with variable time discretization via sustained actions [0.0]
SusACER is an off-policyReinforcement learning algorithm that combines the advantages of different time discretization settings.
We analyze the effects of the changing time discretization in robotic control environments: Ant, HalfCheetah, Hopper, and Walker2D.
arXiv Detail & Related papers (2023-08-08T14:45:00Z) - Budgeting Counterfactual for Offline RL [25.918011878015136]
We propose an approach to explicitly bound the amount of out-of-distribution actions during training.
We show that the overall performance of our method is better than the state-of-the-art offline RL methods on tasks in the widely-used D4RL benchmarks.
arXiv Detail & Related papers (2023-07-12T17:47:35Z) - Learning Multi-Agent Intention-Aware Communication for Optimal
Multi-Order Execution in Finance [96.73189436721465]
We first present a multi-agent RL (MARL) method for multi-order execution considering practical constraints.
We propose a learnable multi-round communication protocol, for the agents communicating the intended actions with each other.
Experiments on the data from two real-world markets have illustrated superior performance with significantly better collaboration effectiveness.
arXiv Detail & Related papers (2023-07-06T16:45:40Z) - STEEL: Singularity-aware Reinforcement Learning [14.424199399139804]
Batch reinforcement learning (RL) aims at leveraging pre-collected data to find an optimal policy.
We propose a new batch RL algorithm that allows for singularity for both state and action spaces.
By leveraging the idea of pessimism and under some technical conditions, we derive a first finite-sample regret guarantee for our proposed algorithm.
arXiv Detail & Related papers (2023-01-30T18:29:35Z) - A Unifying Framework for Online Optimization with Long-Term Constraints [62.35194099438855]
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints.
The goal is to maximize their total reward, while at the same time achieving small cumulative violation across the $T$ rounds.
We present the first best-of-both-world type algorithm for this general class problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown model, and in the case in which they are selected at each round by an adversary.
arXiv Detail & Related papers (2022-09-15T16:59:19Z) - Online Bidding Algorithms for Return-on-Spend Constrained Advertisers [10.500109788348732]
This work explores efficient online algorithms for a single value-maximizing advertiser under an increasingly popular constraint: Return-on-Spend (RoS)
We contribute a simple online algorithm that achieves near-optimal regret in expectation while always respecting the specified RoS constraint.
arXiv Detail & Related papers (2022-08-29T16:49:24Z) - Continuous Doubly Constrained Batch Reinforcement Learning [93.23842221189658]
We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment.
The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data.
We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates.
arXiv Detail & Related papers (2021-02-18T08:54:14Z) - Solving the Order Batching and Sequencing Problem using Deep
Reinforcement Learning [2.4565068569913384]
We present a Deep Reinforcement Learning (DRL) approach for deciding how and when orders should be batched and picked in a warehouse to minimize the number of tardy orders.
In particular, the technique facilitates making decisions on whether an order should be picked individually (pick-by-order) or picked in a batch with other orders (pick-by-batch) and if so with which other orders.
arXiv Detail & Related papers (2020-06-16T20:40:41Z) - Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs)
We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model.
Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.