Reinforced Deep Markov Models With Applications in Automatic Trading
- URL: http://arxiv.org/abs/2011.04391v1
- Date: Mon, 9 Nov 2020 12:46:30 GMT
- Title: Reinforced Deep Markov Models With Applications in Automatic Trading
- Authors: Tadeu A. Ferreira
- Abstract summary: We propose a model-based RL approach, coined Reinforced Deep Markov Model (RDMM)
RDMM integrates desirable properties of a reinforcement learning algorithm acting as an automatic trading system.
Tests show that the RDMM is data-efficient and provides financial gains compared to the benchmarks in the optimal execution problem.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inspired by the developments in deep generative models, we propose a
model-based RL approach, coined Reinforced Deep Markov Model (RDMM), designed
to integrate desirable properties of a reinforcement learning algorithm acting
as an automatic trading system. The network architecture allows for the
possibility that market dynamics are partially visible and are potentially
modified by the agent's actions. The RDMM filters incomplete and noisy data, to
create better-behaved input data for RL planning. The policy search
optimisation also properly accounts for state uncertainty. Due to the
complexity of the RKDF model architecture, we performed ablation studies to
understand the contributions of individual components of the approach better.
To test the financial performance of the RDMM we implement policies using
variants of Q-Learning, DynaQ-ARIMA and DynaQ-LSTM algorithms. The experiments
show that the RDMM is data-efficient and provides financial gains compared to
the benchmarks in the optimal execution problem. The performance improvement
becomes more pronounced when price dynamics are more complex, and this has been
demonstrated using real data sets from the limit order book of Facebook, Intel,
Vodafone and Microsoft.
Related papers
- DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs [70.91804882618243]
This paper proposes DSMoE, a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks.
We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge.
Experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing pruning and MoE approaches.
arXiv Detail & Related papers (2025-02-18T02:37:26Z) - Exploratory Mean-Variance Portfolio Optimization with Regime-Switching Market Dynamics [3.6149777601911097]
We study a regime-switching market setting and apply reinforcement learning techniques to assist informed exploration within the control space.
In a real market data study, EMVRS with OC learning outperforms its counterparts with the highest mean and reasonably low volatility of the annualized portfolio returns.
arXiv Detail & Related papers (2025-01-28T02:48:41Z) - Stealing That Free Lunch: Exposing the Limits of Dyna-Style Reinforcement Learning [10.117626902557927]
Dyna-style off-policy model-based reinforcement learning (DMBRL) algorithms are a family of techniques for generating synthetic state transition data.
This paper identifies and investigates a surprising performance gap observed when applying DMBRL algorithms across different benchmark environments.
arXiv Detail & Related papers (2024-12-18T20:25:04Z) - MetaTrading: An Immersion-Aware Model Trading Framework for Vehicular Metaverse Services [94.61039892220037]
We present a novel immersion-aware model trading framework that incentivizes metaverse users (MUs) to contribute learning models for augmented reality (AR) services in the vehicular metaverse.
Considering dynamic network conditions and privacy concerns, we formulate the reward decisions of MSPs as a multi-agent Markov decision process.
Experimental results demonstrate that the proposed framework can effectively provide higher-value models for object detection and classification in AR services on real AR-related vehicle datasets.
arXiv Detail & Related papers (2024-10-25T16:20:46Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - Let's reward step by step: Step-Level reward model as the Navigators for
Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase.
We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs.
To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z) - Sample Complexity of Robust Reinforcement Learning with a Generative
Model [0.0]
We propose a model-based reinforcement learning (RL) algorithm for learning an $epsilon$-optimal robust policy.
We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence.
In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies.
arXiv Detail & Related papers (2021-12-02T18:55:51Z) - On Effective Scheduling of Model-based Reinforcement Learning [53.027698625496015]
We propose a framework named AutoMBPO to automatically schedule the real data ratio.
In this paper, we first theoretically analyze the role of real data in policy training, which suggests that gradually increasing the ratio of real data yields better performance.
arXiv Detail & Related papers (2021-11-16T15:24:59Z) - Blending MPC & Value Function Approximation for Efficient Reinforcement
Learning [42.429730406277315]
Model-Predictive Control (MPC) is a powerful tool for controlling complex, real-world systems.
We present a framework for improving on MPC with model-free reinforcement learning (RL)
We show that our approach can obtain performance comparable with MPC with access to true dynamics.
arXiv Detail & Related papers (2020-12-10T11:32:01Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.