Deep Deterministic Portfolio Optimization
- URL: http://arxiv.org/abs/2003.06497v2
- Date: Thu, 9 Apr 2020 10:56:24 GMT
- Title: Deep Deterministic Portfolio Optimization
- Authors: Ayman Chaouki, Stephen Hardiman, Christian Schmidt, Emmanuel
S\'eri\'e, and Joachim de Lataillade
- Abstract summary: This work is to test reinforcement learning algorithms on conceptually simple, but mathematically non-trivial, trading environments.
We study the deep deterministic policy gradient algorithm and show that such a reinforcement learning agent can successfully recover the essential features of the optimal trading strategies.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Can deep reinforcement learning algorithms be exploited as solvers for
optimal trading strategies? The aim of this work is to test reinforcement
learning algorithms on conceptually simple, but mathematically non-trivial,
trading environments. The environments are chosen such that an optimal or
close-to-optimal trading strategy is known. We study the deep deterministic
policy gradient algorithm and show that such a reinforcement learning agent can
successfully recover the essential features of the optimal trading strategies
and achieve close-to-optimal rewards.
Related papers
- Satisficing Exploration for Deep Reinforcement Learning [26.73584163318647]
In complex environments that approach the vastness and scale of the real world, attaining optimal performance may in fact be an entirely intractable endeavor.
Recent work has leveraged tools from information theory to design agents that deliberately forgo optimal solutions in favor of sufficiently-satisfying or satisficing solutions.
We extend an agent that directly represents uncertainty over the optimal value function allowing it to both bypass the need for model-based planning and to learn satisficing policies.
arXiv Detail & Related papers (2024-07-16T21:28:03Z) - Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs.
We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention.
Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z) - From Learning to Optimize to Learning Optimization Algorithms [4.066869900592636]
We identify key principles that classical algorithms obey, but have up to now, not been used for Learning to Optimize (L2O)
We provide a general design pipeline, taking into account data, architecture and learning strategy, and thereby enabling a synergy between classical optimization and L2O.
We demonstrate the success of these novel principles by designing a new learning-enhanced BFGS algorithm and provide numerical experiments evidencing its adaptation to many settings at test time.
arXiv Detail & Related papers (2024-05-28T14:30:07Z) - Robust Utility Optimization via a GAN Approach [3.74142789780782]
We propose a generative adversarial network (GAN) approach to solve robust utility optimization problems.
In particular, we model both the investor and the market by neural networks (NN) and train them in a mini-max zero-sum game.
arXiv Detail & Related papers (2024-03-22T14:36:39Z) - From Bandits Model to Deep Deterministic Policy Gradient, Reinforcement
Learning with Contextual Information [4.42532447134568]
In this study, we use two methods to overcome the issue with contextual information.
In order to investigate strategic trading in quantitative markets, we merged the earlier financial trading strategy known as constant proportion portfolio insurance ( CPPI) into deep deterministic policy gradient (DDPG)
The experimental results show that both methods can accelerate the progress of reinforcement learning to obtain the optimal solution.
arXiv Detail & Related papers (2023-10-01T11:25:20Z) - Reinforcement Learning for Credit Index Option Hedging [2.568904868787359]
In this paper, we focus on finding the optimal hedging strategy of a credit index option using reinforcement learning.
We take a practical approach, where the focus is on realism i.e. discrete time, transaction costs; even testing our policy on real market data.
arXiv Detail & Related papers (2023-07-19T09:03:41Z) - Understanding the Effect of Stochasticity in Policy Optimization [86.7574122154668]
We show that the preferability of optimization methods depends critically on whether exact gradients are used.
Second, to explain these findings we introduce the concept of committal rate for policy optimization.
Third, we show that in the absence of external oracle information, there is an inherent trade-off between exploiting geometry to accelerate convergence versus achieving optimality almost surely.
arXiv Detail & Related papers (2021-10-29T06:35:44Z) - The Information Geometry of Unsupervised Reinforcement Learning [133.20816939521941]
Unsupervised skill discovery is a class of algorithms that learn a set of policies without access to a reward function.
We show that unsupervised skill discovery algorithms do not learn skills that are optimal for every possible reward function.
arXiv Detail & Related papers (2021-10-06T13:08:36Z) - Universal Trading for Order Execution with Oracle Policy Distillation [99.57416828489568]
We propose a novel universal trading policy optimization framework to bridge the gap between the noisy yet imperfect market states and the optimal action sequences for order execution.
We show that our framework can better guide the learning of the common policy towards practically optimal execution by an oracle teacher with perfect information.
arXiv Detail & Related papers (2021-01-28T05:52:18Z) - Mixed Strategies for Robust Optimization of Unknown Objectives [93.8672371143881]
We consider robust optimization problems, where the goal is to optimize an unknown objective function against the worst-case realization of an uncertain parameter.
We design a novel sample-efficient algorithm GP-MRO, which sequentially learns about the unknown objective from noisy point evaluations.
GP-MRO seeks to discover a robust and randomized mixed strategy, that maximizes the worst-case expected objective value.
arXiv Detail & Related papers (2020-02-28T09:28:17Z) - Provably Efficient Exploration in Policy Optimization [117.09887790160406]
This paper proposes an Optimistic variant of the Proximal Policy Optimization algorithm (OPPO)
OPPO achieves $tildeO(sqrtd2 H3 T )$ regret.
To the best of our knowledge, OPPO is the first provably efficient policy optimization algorithm that explores.
arXiv Detail & Related papers (2019-12-12T08:40:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.