Related papers: From Bandits Model to Deep Deterministic Policy Gradient, Reinforcement Learning with Contextual Information

From Bandits Model to Deep Deterministic Policy Gradient, Reinforcement Learning with Contextual Information

URL: http://arxiv.org/abs/2310.00642v1
Date: Sun, 1 Oct 2023 11:25:20 GMT
Title: From Bandits Model to Deep Deterministic Policy Gradient, Reinforcement Learning with Contextual Information
Authors: Zhendong Shi, Xiaoli Wei and Ercan E. Kuruoglu
Abstract summary: In this study, we use two methods to overcome the issue with contextual information. In order to investigate strategic trading in quantitative markets, we merged the earlier financial trading strategy known as constant proportion portfolio insurance ( CPPI) into deep deterministic policy gradient (DDPG) The experimental results show that both methods can accelerate the progress of reinforcement learning to obtain the optimal solution.
Score: 4.42532447134568
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The problem of how to take the right actions to make profits in sequential process continues to be difficult due to the quick dynamics and a significant amount of uncertainty in many application scenarios. In such complicated environments, reinforcement learning (RL), a reward-oriented strategy for optimum control, has emerged as a potential technique to address this strategic decision-making issue. However, reinforcement learning also has some shortcomings that make it unsuitable for solving many financial problems, excessive resource consumption, and inability to quickly obtain optimal solutions, making it unsuitable for quantitative trading markets. In this study, we use two methods to overcome the issue with contextual information: contextual Thompson sampling and reinforcement learning under supervision which can accelerate the iterations in search of the best answer. In order to investigate strategic trading in quantitative markets, we merged the earlier financial trading strategy known as constant proportion portfolio insurance (CPPI) into deep deterministic policy gradient (DDPG). The experimental results show that both methods can accelerate the progress of reinforcement learning to obtain the optimal solution.

Related papers

Market Making Strategies with Reinforcement Learning [0.0]
Market makers (MMs) play a fundamental role in providing liquidity, yet face significant challenges arising from inventory risk, competition, and non-stationary market dynamics.<n>This research explores how Reinforcement Learning can be employed to develop autonomous, adaptive, and profitable market making strategies.
arXiv Detail & Related papers (2025-07-24T16:17:49Z)
EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning [69.55982246413046]
We propose explicit policy optimization (EPO) for strategic reasoning. EPO provides strategies in open-ended action space and can be plugged into arbitrary LLM agents to motivate goal-directed behavior. Experiments across social and physical domains demonstrate EPO's ability of long-term goal alignment.
arXiv Detail & Related papers (2025-02-18T03:15:55Z)
Deep Reinforcement Learning for Online Optimal Execution Strategies [49.1574468325115]
This paper tackles the challenge of learning non-Markovian optimal execution strategies in dynamic financial markets. We introduce a novel actor-critic algorithm based on Deep Deterministic Policy Gradient (DDPG) We show that our algorithm successfully approximates the optimal execution strategy.
arXiv Detail & Related papers (2024-10-17T12:38:08Z)
Ensembling Portfolio Strategies for Long-Term Investments: A Distribution-Free Preference Framework for Decision-Making and Algorithms [0.0]
This paper investigates the problem of ensembling multiple strategies for sequential portfolios to outperform individual strategies in terms of long-term wealth. We introduce a novel framework for decision-making in combining strategies, irrespective of market conditions. We show results in favor of the proposed strategies, albeit with small tradeoffs in their Sharpe ratios.
arXiv Detail & Related papers (2024-06-05T23:08:57Z)
Risk-reducing design and operations toolkit: 90 strategies for managing risk and uncertainty in decision problems [65.268245109828]
This paper develops a catalog of such strategies and develops a framework for them. It argues that they provide an efficient response to decision problems that are seemingly intractable due to high uncertainty. It then proposes a framework to incorporate them into decision theory using multi-objective optimization.
arXiv Detail & Related papers (2023-09-06T16:14:32Z)
On solving decision and risk management problems subject to uncertainty [91.3755431537592]
Uncertainty is a pervasive challenge in decision and risk management. This paper develops a systematic understanding of such strategies, determine their range of application, and develop a framework to better employ them.
arXiv Detail & Related papers (2023-01-18T19:16:23Z)
Reinforcement Learning with Stepwise Fairness Constraints [50.538878453547966]
We introduce the study of reinforcement learning with stepwise fairness constraints. We provide learning algorithms with strong theoretical guarantees in regard to policy optimality and fairness violation.
arXiv Detail & Related papers (2022-11-08T04:06:23Z)
Universal Trading for Order Execution with Oracle Policy Distillation [99.57416828489568]
We propose a novel universal trading policy optimization framework to bridge the gap between the noisy yet imperfect market states and the optimal action sequences for order execution. We show that our framework can better guide the learning of the common policy towards practically optimal execution by an oracle teacher with perfect information.
arXiv Detail & Related papers (2021-01-28T05:52:18Z)
Time your hedge with Deep Reinforcement Learning [0.0]
Deep Reinforcement Learning (DRL) can tackle this challenge by creating a dynamic dependency between market information and hedging strategies allocation decisions. We present a realistic and augmented DRL framework that: (i) uses additional contextual information to decide an action, (ii) has a one period lag between observations and actions to account for one day lag turnover of common asset managers to rebalance their hedge, (iii) is fully tested in terms of stability and robustness thanks to a repetitive train test method called anchored walk forward training, similar in spirit to k fold cross validation for time series and (iv) allows managing leverage of our hedging
arXiv Detail & Related papers (2020-09-16T06:43:41Z)
Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments. We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z)
An Application of Deep Reinforcement Learning to Algorithmic Trading [4.523089386111081]
This scientific research paper presents an innovative approach based on deep reinforcement learning (DRL) to solve the algorithmic trading problem. It proposes a novel DRL trading strategy so as to maximise the resulting Sharpe ratio performance indicator on a broad range of stock markets. The training of the resulting reinforcement learning (RL) agent is entirely based on the generation of artificial trajectories from a limited set of stock market historical data.
arXiv Detail & Related papers (2020-04-07T14:57:23Z)
Deep Deterministic Portfolio Optimization [0.0]
This work is to test reinforcement learning algorithms on conceptually simple, but mathematically non-trivial, trading environments. We study the deep deterministic policy gradient algorithm and show that such a reinforcement learning agent can successfully recover the essential features of the optimal trading strategies.
arXiv Detail & Related papers (2020-03-13T22:20:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.