Related papers: Learn Continuously, Act Discretely: Hybrid Action-Space Reinforcement Learning For Optimal Execution

Learn Continuously, Act Discretely: Hybrid Action-Space Reinforcement Learning For Optimal Execution

URL: http://arxiv.org/abs/2207.11152v1
Date: Fri, 22 Jul 2022 15:50:44 GMT
Title: Learn Continuously, Act Discretely: Hybrid Action-Space Reinforcement Learning For Optimal Execution
Authors: Feiyang Pan, Tongzhe Zhang, Ling Luo, Jia He, Shuoling Liu
Abstract summary: Reinforcement learning can help decide the order-splitting sizes. Key challenge lies in the "continuous-discrete duality" of the action space. We propose a hybrid RL method to combine the advantages of both.
Score: 8.021077964915996
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Optimal execution is a sequential decision-making problem for cost-saving in algorithmic trading. Studies have found that reinforcement learning (RL) can help decide the order-splitting sizes. However, a problem remains unsolved: how to place limit orders at appropriate limit prices? The key challenge lies in the "continuous-discrete duality" of the action space. On the one hand, the continuous action space using percentage changes in prices is preferred for generalization. On the other hand, the trader eventually needs to choose limit prices discretely due to the existence of the tick size, which requires specialization for every single stock with different characteristics (e.g., the liquidity and the price range). So we need continuous control for generalization and discrete control for specialization. To this end, we propose a hybrid RL method to combine the advantages of both of them. We first use a continuous control agent to scope an action subset, then deploy a fine-grained agent to choose a specific limit price. Extensive experiments show that our method has higher sample efficiency and better training stability than existing RL algorithms and significantly outperforms previous learning-based methods for order execution.

Related papers

Offline Dynamic Inventory and Pricing Strategy: Addressing Censored and Dependent Demand [7.289672463326423]
We study the offline sequential feature-based pricing and inventory control problem. Our goal is to leverage the offline dataset to estimate the optimal pricing and inventory control policy.
arXiv Detail & Related papers (2025-04-14T02:57:51Z)
Double-Bounded Optimal Transport for Advanced Clustering and Classification [58.237576976486544]
We propose Doubly Bounded Optimal Transport (DB-OT), which assumes that the target distribution is restricted within two boundaries instead of a fixed one. We show that our method can achieve good results with our improved inference scheme in the testing stage.
arXiv Detail & Related papers (2024-01-21T07:43:01Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Actor-Critic with variable time discretization via sustained actions [0.0]
SusACER is an off-policyReinforcement learning algorithm that combines the advantages of different time discretization settings. We analyze the effects of the changing time discretization in robotic control environments: Ant, HalfCheetah, Hopper, and Walker2D.
arXiv Detail & Related papers (2023-08-08T14:45:00Z)
Budgeting Counterfactual for Offline RL [25.918011878015136]
We propose an approach to explicitly bound the amount of out-of-distribution actions during training. We show that the overall performance of our method is better than the state-of-the-art offline RL methods on tasks in the widely-used D4RL benchmarks.
arXiv Detail & Related papers (2023-07-12T17:47:35Z)
Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance [96.73189436721465]
We first present a multi-agent RL (MARL) method for multi-order execution considering practical constraints. We propose a learnable multi-round communication protocol, for the agents communicating the intended actions with each other. Experiments on the data from two real-world markets have illustrated superior performance with significantly better collaboration effectiveness.
arXiv Detail & Related papers (2023-07-06T16:45:40Z)
STEEL: Singularity-aware Reinforcement Learning [14.424199399139804]
Batch reinforcement learning (RL) aims at leveraging pre-collected data to find an optimal policy. We propose a new batch RL algorithm that allows for singularity for both state and action spaces. By leveraging the idea of pessimism and under some technical conditions, we derive a first finite-sample regret guarantee for our proposed algorithm.
arXiv Detail & Related papers (2023-01-30T18:29:35Z)
A Unifying Framework for Online Optimization with Long-Term Constraints [62.35194099438855]
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints. The goal is to maximize their total reward, while at the same time achieving small cumulative violation across the $T$ rounds. We present the first best-of-both-world type algorithm for this general class problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown model, and in the case in which they are selected at each round by an adversary.
arXiv Detail & Related papers (2022-09-15T16:59:19Z)
Online Bidding Algorithms for Return-on-Spend Constrained Advertisers [10.500109788348732]
This work explores efficient online algorithms for a single value-maximizing advertiser under an increasingly popular constraint: Return-on-Spend (RoS) We contribute a simple online algorithm that achieves near-optimal regret in expectation while always respecting the specified RoS constraint.
arXiv Detail & Related papers (2022-08-29T16:49:24Z)
Continuous Doubly Constrained Batch Reinforcement Learning [93.23842221189658]
We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data. We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates.
arXiv Detail & Related papers (2021-02-18T08:54:14Z)
Solving the Order Batching and Sequencing Problem using Deep Reinforcement Learning [2.4565068569913384]
We present a Deep Reinforcement Learning (DRL) approach for deciding how and when orders should be batched and picked in a warehouse to minimize the number of tardy orders. In particular, the technique facilitates making decisions on whether an order should be picked individually (pick-by-order) or picked in a batch with other orders (pick-by-batch) and if so with which other orders.
arXiv Detail & Related papers (2020-06-16T20:40:41Z)
Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs) We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model. Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.