Imitate then Transcend: Multi-Agent Optimal Execution with Dual-Window
Denoise PPO
- URL: http://arxiv.org/abs/2206.10736v1
- Date: Tue, 21 Jun 2022 21:25:30 GMT
- Title: Imitate then Transcend: Multi-Agent Optimal Execution with Dual-Window
Denoise PPO
- Authors: Jin Fang, Jiacheng Weng, Yi Xiang, Xinwen Zhang
- Abstract summary: A novel framework for solving the optimal execution and placement problems using reinforcement learning (RL) with imitation was proposed.
The RL agents trained from the proposed framework consistently outperformed the industry benchmark time-weighted average price (TWAP) strategy in execution cost.
- Score: 13.05016423016994
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A novel framework for solving the optimal execution and placement problems
using reinforcement learning (RL) with imitation was proposed. The RL agents
trained from the proposed framework consistently outperformed the industry
benchmark time-weighted average price (TWAP) strategy in execution cost and
showed great generalization across out-of-sample trading dates and tickers. The
impressive performance was achieved from three aspects. First, our RL network
architecture called Dual-window Denoise PPO enabled efficient learning in a
noisy market environment. Second, a reward scheme with imitation learning was
designed, and a comprehensive set of market features was studied. Third, our
flexible action formulation allowed the RL agent to tackle optimal execution
and placement collectively resulting in better performance than solving
individual problems separately. The RL agent's performance was evaluated in our
multi-agent realistic historical limit order book simulator in which price
impact was accurately assessed. In addition, ablation studies were also
performed, confirming the superiority of our framework.
Related papers
- CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing [70.25689961697523]
We propose a generalizable algorithm that enhances sequential reasoning by cross-task experience sharing and selection.
Our work bridges the gap between existing sequential reasoning paradigms and validates the effectiveness of leveraging cross-task experiences.
arXiv Detail & Related papers (2024-10-22T03:59:53Z) - Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning [13.753960633998389]
Reinforcement learning (RL) has emerged as a pivotal technique for fine-tuning large language models (LLMs) on specific tasks.
In this paper, we propose CORY, extending the RL fine-tuning of LLMs to a sequential cooperative multi-agent reinforcement learning framework.
Results show that CORY outperforms PPO in terms of policy optimality, resistance to distribution collapse, and training robustness.
arXiv Detail & Related papers (2024-10-08T14:55:26Z) - VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment [66.80143024475635]
We propose VinePPO, a straightforward approach to compute unbiased Monte Carlo-based estimates.
We show that VinePPO consistently outperforms PPO and other RL-free baselines across MATH and GSM8K datasets.
arXiv Detail & Related papers (2024-10-02T15:49:30Z) - Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training.
Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z) - Heuristic Algorithm-based Action Masking Reinforcement Learning (HAAM-RL) with Ensemble Inference Method [0.0]
This paper presents a novel reinforcement learning approach called HAAMRL (Heuristic ensemble-based Action Masking Reinforcement Learning)
The proposed approach exhibits superior performance and capability generalization, indicating superior effectiveness in optimizing complex manufacturing processes.
arXiv Detail & Related papers (2024-03-21T03:42:39Z) - Domain-adapted Learning and Imitation: DRL for Power Arbitrage [1.6874375111244329]
We propose a collaborative dual-agent reinforcement learning approach for this bi-level simulation and optimization of European power arbitrage trading.
We introduce two new implementations designed to incorporate domain-specific knowledge by imitating the trading behaviours of power traders.
Our study demonstrates that by leveraging domain expertise in a general learning problem, the performance can be improved substantially.
arXiv Detail & Related papers (2023-01-19T23:36:23Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Model-Free Reinforcement Learning for Asset Allocation [0.0]
This study investigated the performance of reinforcement learning when applied to portfolio management using model-free deep RL agents.
We trained several RL agents on real-world stock prices to learn how to perform asset allocation.
Four RL agents (A2C, SAC, PPO, and TRPO) outperformed the best baseline, MPT, overall.
arXiv Detail & Related papers (2022-09-21T16:00:24Z) - Functional Optimization Reinforcement Learning for Real-Time Bidding [14.5826735379053]
Real-time bidding is the new paradigm of programmatic advertising.
Existing approaches are struggling to provide a satisfactory solution for bidding optimization.
This paper proposes a multi-agent reinforcement learning architecture for RTB with functional optimization.
arXiv Detail & Related papers (2022-06-25T06:12:17Z) - Adaptive Stochastic ADMM for Decentralized Reinforcement Learning in
Edge Industrial IoT [106.83952081124195]
Reinforcement learning (RL) has been widely investigated and shown to be a promising solution for decision-making and optimal control processes.
We propose an adaptive ADMM (asI-ADMM) algorithm and apply it to decentralized RL with edge-computing-empowered IIoT networks.
Experiment results show that our proposed algorithms outperform the state of the art in terms of communication costs and scalability, and can well adapt to complex IoT environments.
arXiv Detail & Related papers (2021-06-30T16:49:07Z) - Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs.
We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.