Reinforced Deep Markov Models With Applications in Automatic Trading
- URL: http://arxiv.org/abs/2011.04391v1
- Date: Mon, 9 Nov 2020 12:46:30 GMT
- Title: Reinforced Deep Markov Models With Applications in Automatic Trading
- Authors: Tadeu A. Ferreira
- Abstract summary: We propose a model-based RL approach, coined Reinforced Deep Markov Model (RDMM)
RDMM integrates desirable properties of a reinforcement learning algorithm acting as an automatic trading system.
Tests show that the RDMM is data-efficient and provides financial gains compared to the benchmarks in the optimal execution problem.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inspired by the developments in deep generative models, we propose a
model-based RL approach, coined Reinforced Deep Markov Model (RDMM), designed
to integrate desirable properties of a reinforcement learning algorithm acting
as an automatic trading system. The network architecture allows for the
possibility that market dynamics are partially visible and are potentially
modified by the agent's actions. The RDMM filters incomplete and noisy data, to
create better-behaved input data for RL planning. The policy search
optimisation also properly accounts for state uncertainty. Due to the
complexity of the RKDF model architecture, we performed ablation studies to
understand the contributions of individual components of the approach better.
To test the financial performance of the RDMM we implement policies using
variants of Q-Learning, DynaQ-ARIMA and DynaQ-LSTM algorithms. The experiments
show that the RDMM is data-efficient and provides financial gains compared to
the benchmarks in the optimal execution problem. The performance improvement
becomes more pronounced when price dynamics are more complex, and this has been
demonstrated using real data sets from the limit order book of Facebook, Intel,
Vodafone and Microsoft.
Related papers
- MetaTrading: An Immersion-Aware Model Trading Framework for Vehicular Metaverse Services [94.61039892220037]
We present a novel immersion-aware model trading framework that incentivizes metaverse users (MUs) to contribute learning models for augmented reality (AR) services in the vehicular metaverse.
Considering dynamic network conditions and privacy concerns, we formulate the reward decisions of MSPs as a multi-agent Markov decision process.
Experimental results demonstrate that the proposed framework can effectively provide higher-value models for object detection and classification in AR services on real AR-related vehicle datasets.
arXiv Detail & Related papers (2024-10-25T16:20:46Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - Let's reward step by step: Step-Level reward model as the Navigators for
Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase.
We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs.
To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Robust Reinforcement Learning using Offline Data [23.260211453437055]
We propose a robust reinforcement learning algorithm called Robust Fitted Q-Iteration (RFQI)
RFQI uses only an offline dataset to learn the optimal robust policy.
We prove that RFQI learns a near-optimal robust policy under standard assumptions.
arXiv Detail & Related papers (2022-08-10T03:47:45Z) - Sample Complexity of Robust Reinforcement Learning with a Generative
Model [0.0]
We propose a model-based reinforcement learning (RL) algorithm for learning an $epsilon$-optimal robust policy.
We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence.
In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies.
arXiv Detail & Related papers (2021-12-02T18:55:51Z) - On Effective Scheduling of Model-based Reinforcement Learning [53.027698625496015]
We propose a framework named AutoMBPO to automatically schedule the real data ratio.
In this paper, we first theoretically analyze the role of real data in policy training, which suggests that gradually increasing the ratio of real data yields better performance.
arXiv Detail & Related papers (2021-11-16T15:24:59Z) - Blending MPC & Value Function Approximation for Efficient Reinforcement
Learning [42.429730406277315]
Model-Predictive Control (MPC) is a powerful tool for controlling complex, real-world systems.
We present a framework for improving on MPC with model-free reinforcement learning (RL)
We show that our approach can obtain performance comparable with MPC with access to true dynamics.
arXiv Detail & Related papers (2020-12-10T11:32:01Z) - Variational Model-based Policy Optimization [34.80171122943031]
Model-based reinforcement learning (RL) algorithms allow us to combine model-generated data with those collected from interaction with the real system in order to alleviate the data efficiency problem in RL.
We propose an objective function as a variational lower-bound of a log-likelihood of a log-likelihood to jointly learn and improve model and policy.
Our experiments on a number of continuous control tasks show that despite being more complex, our model-based (E-step) algorithm, called emactoral model-based policy optimization (VMBPO), is more sample-efficient and
arXiv Detail & Related papers (2020-06-09T18:30:15Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.