Related papers: Optimal Execution Using Reinforcement Learning

Optimal Execution Using Reinforcement Learning

URL: http://arxiv.org/abs/2306.17178v1
Date: Mon, 19 Jun 2023 07:09:59 GMT
Title: Optimal Execution Using Reinforcement Learning
Authors: Cong Zheng and Jiafa He and Can Yang
Abstract summary: This work is about optimal order execution, where a large order is split into several small orders to maximize the implementation shortfall. Based on the diversity of cryptocurrency exchanges, we attempt to extract cross-exchange signals by aligning data from multiple exchanges for the first time.
Score: 6.905391624417593
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work is about optimal order execution, where a large order is split into several small orders to maximize the implementation shortfall. Based on the diversity of cryptocurrency exchanges, we attempt to extract cross-exchange signals by aligning data from multiple exchanges for the first time. Unlike most previous studies that focused on using single-exchange information, we discuss the impact of cross-exchange signals on the agent's decision-making in the optimal execution problem. Experimental results show that cross-exchange signals can provide additional information for the optimal execution of cryptocurrency to facilitate the optimal execution process.

Related papers

Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection [71.92083784393418]
Inference-time methods such as Best-of-N (BON) sampling offer a simple yet effective alternative to improve performance. We propose Iterative Agent Decoding (IAD) which combines iterative refinement with dynamic candidate evaluation and selection guided by a verifier.
arXiv Detail & Related papers (2025-04-02T17:40:47Z)
Make Optimization Once and for All with Fine-grained Guidance [78.14885351827232]
Learning to Optimize (L2O) enhances optimization efficiency with integrated neural networks. L2O paradigms achieve great outcomes, e.g., refitting, generating unseen solutions iteratively or directly. Our analyses explore general framework for learning optimization, called Diff-L2O, focusing on augmenting solutions from a wider view.
arXiv Detail & Related papers (2025-03-14T14:48:12Z)
Optimistic ε-Greedy Exploration for Cooperative Multi-Agent Reinforcement Learning [16.049852176246038]
We propose Optimistic $epsilon$-Greedy Exploration, focusing on enhancing exploration to correct value estimations. We introduce an optimistic updating network to identify optimal actions and sample actions from its distribution with a probability of $epsilon$ during exploration. Experimental results in various environments reveal that the Optimistic $epsilon$-Greedy Exploration effectively prevents the algorithm from suboptimal solutions.
arXiv Detail & Related papers (2025-02-05T12:06:54Z)
Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization [52.80408805368928]
We introduce a novel greedy-style subset selection algorithm for batch acquisition. Our experiments on the red fluorescent proteins show that our proposed method achieves the baseline performance in 1.69x fewer queries.
arXiv Detail & Related papers (2024-06-21T05:57:08Z)
Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs. We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention. Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z)
Binary Classifier Optimization for Large Language Model Alignment [4.61411484523337]
We present theoretical foundations to explain the successful alignment achieved through binary signals. We identify two techniques for effective alignment: reward shift and underlying distribution matching. Our model consistently demonstrates effective and robust alignment across two base LLMs and three different binary signal datasets.
arXiv Detail & Related papers (2024-04-06T15:20:59Z)
Data-Efficient Interactive Multi-Objective Optimization Using ParEGO [6.042269506496206]
Multi-objective optimization seeks to identify a set of non-dominated solutions that provide optimal trade-offs among competing objectives. In practical applications, decision-makers (DMs) will select a single solution that aligns with their preferences to be implemented. We propose two novel algorithms that efficiently locate the most preferred region of the Pareto front in expensive-to-evaluate problems.
arXiv Detail & Related papers (2024-01-12T15:55:51Z)
Delegating Data Collection in Decentralized Machine Learning [67.0537668772372]
Motivated by the emergence of decentralized machine learning (ML) ecosystems, we study the delegation of data collection. We design optimal and near-optimal contracts that deal with two fundamental information asymmetries. We show that a principal can cope with such asymmetry via simple linear contracts that achieve 1-1/e fraction of the optimal utility.
arXiv Detail & Related papers (2023-09-04T22:16:35Z)
Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance [96.73189436721465]
We first present a multi-agent RL (MARL) method for multi-order execution considering practical constraints. We propose a learnable multi-round communication protocol, for the agents communicating the intended actions with each other. Experiments on the data from two real-world markets have illustrated superior performance with significantly better collaboration effectiveness.
arXiv Detail & Related papers (2023-07-06T16:45:40Z)
Learning Proximal Operators to Discover Multiple Optima [66.98045013486794]
We present an end-to-end method to learn the proximal operator across non-family problems. We show that for weakly-ized objectives and under mild conditions, the method converges globally.
arXiv Detail & Related papers (2022-01-28T05:53:28Z)
Universal Trading for Order Execution with Oracle Policy Distillation [99.57416828489568]
We propose a novel universal trading policy optimization framework to bridge the gap between the noisy yet imperfect market states and the optimal action sequences for order execution. We show that our framework can better guide the learning of the common policy towards practically optimal execution by an oracle teacher with perfect information.
arXiv Detail & Related papers (2021-01-28T05:52:18Z)
Extrapolation-based Prediction-Correction Methods for Time-varying Convex Optimization [5.768816587293478]
We discuss algorithms for online optimization based on the prediction-correction paradigm. We propose a novel and tailored prediction strategy, which we call extrapolation-based. We discuss the empirical performance of the algorithm when applied to signal processing, machine learning, and robotics problems.
arXiv Detail & Related papers (2020-04-24T12:48:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.