Bridging the gap between Markowitz planning and deep reinforcement
learning
- URL: http://arxiv.org/abs/2010.09108v1
- Date: Wed, 30 Sep 2020 04:03:27 GMT
- Title: Bridging the gap between Markowitz planning and deep reinforcement
learning
- Authors: Eric Benhamou, David Saltiel, Sandrine Ungari, Abhishek Mukhopadhyay
- Abstract summary: This paper shows how Deep Reinforcement Learning techniques can shed new lights on portfolio allocation.
The advantages are numerous: (i) DRL maps directly market conditions to actions by design and hence should adapt to changing environment, (ii) DRL does not rely on any traditional financial risk assumptions like that risk is represented by variance, (iii) DRL can incorporate additional data and be a multi inputs method as opposed to more traditional optimization methods.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While researchers in the asset management industry have mostly focused on
techniques based on financial and risk planning techniques like Markowitz
efficient frontier, minimum variance, maximum diversification or equal risk
parity, in parallel, another community in machine learning has started working
on reinforcement learning and more particularly deep reinforcement learning to
solve other decision making problems for challenging task like autonomous
driving, robot learning, and on a more conceptual side games solving like Go.
This paper aims to bridge the gap between these two approaches by showing Deep
Reinforcement Learning (DRL) techniques can shed new lights on portfolio
allocation thanks to a more general optimization setting that casts portfolio
allocation as an optimal control problem that is not just a one-step
optimization, but rather a continuous control optimization with a delayed
reward. The advantages are numerous: (i) DRL maps directly market conditions to
actions by design and hence should adapt to changing environment, (ii) DRL does
not rely on any traditional financial risk assumptions like that risk is
represented by variance, (iii) DRL can incorporate additional data and be a
multi inputs method as opposed to more traditional optimization methods. We
present on an experiment some encouraging results using convolution networks.
Related papers
- A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning [4.495144308458951]
We find that training the DRL agent using the actor-critic algorithm and deep function approximators may lead to scenarios where the improvement in the DRL agent's risk-adjusted profitability is not significant.
We propose a novel multi-agent Deep Reinforcement Learning (L) algorithmic framework in this research.
arXiv Detail & Related papers (2025-01-12T15:00:02Z) - Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization [22.67700436936984]
We introduce Direct Advantage Policy Optimization (DAPO), a novel step-level offline reinforcement learning algorithm.
DAPO employs a critic function to predict the reasoning accuracy at each step, thereby generating dense signals to refine the generation strategy.
Our results show that DAPO can effectively enhance the mathematical and code capabilities on both SFT models and RL models, demonstrating the effectiveness of DAPO.
arXiv Detail & Related papers (2024-12-24T08:39:35Z) - Learning for Cross-Layer Resource Allocation in MEC-Aided Cell-Free Networks [71.30914500714262]
Cross-layer resource allocation over mobile edge computing (MEC)-aided cell-free networks can sufficiently exploit the transmitting and computing resources to promote the data rate.
Joint subcarrier allocation and beamforming optimization are investigated for the MEC-aided cell-free network from the perspective of deep learning.
arXiv Detail & Related papers (2024-12-21T10:18:55Z) - MILLION: A General Multi-Objective Framework with Controllable Risk for Portfolio Management [16.797109778036862]
We propose a general Multi-objectIve framework with controLLableIsk for pOrtfolio maMILLION.
In the risk control phase, we propose two methods, i.e., portfolio adaption and portfolio improvement.
The results demonstrate the effectiveness and efficiency of the proposed framework.
arXiv Detail & Related papers (2024-12-04T05:19:34Z) - Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning [67.95280175998792]
A novel adversarial imitation learning (GAIL)-powered policy learning approach is proposed for optimizing beamforming, spectrum allocation, and remote user equipment (RUE) association ins.
We employ inverse RL (IRL) to automatically learn reward functions without manual tuning.
We show that the proposed MA-AL method outperforms traditional RL approaches, achieving a $14.6%$ improvement in convergence and reward value.
arXiv Detail & Related papers (2024-09-27T13:05:02Z) - To Switch or Not to Switch? Balanced Policy Switching in Offline Reinforcement Learning [2.951820152291149]
In several decision problems, one faces the possibility of policy switching, which incurs a non-negligible cost.
We propose a novel strategy for balancing between the gain and the cost of switching in a flexible and principled way.
We establish fundamental properties and design a Net Actor-Critic algorithm for the proposed novel switching formulation.
arXiv Detail & Related papers (2024-07-01T22:24:31Z) - Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy.
The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms.
We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z) - Deep Reinforcement Learning and Mean-Variance Strategies for Responsible Portfolio Optimization [49.396692286192206]
We study the use of deep reinforcement learning for responsible portfolio optimization by incorporating ESG states and objectives.
Our results show that deep reinforcement learning policies can provide competitive performance against mean-variance approaches for responsible portfolio allocation.
arXiv Detail & Related papers (2024-03-25T12:04:03Z) - A Learnheuristic Approach to A Constrained Multi-Objective Portfolio
Optimisation Problem [0.0]
This paper studies multi-objective portfolio optimisation.
It aims to achieve the objective of maximising the expected return while minimising the risk of a given rate of return.
arXiv Detail & Related papers (2023-04-13T17:05:45Z) - Learning to Optimize for Reinforcement Learning [58.01132862590378]
Reinforcement learning (RL) is essentially different from supervised learning, and in practice, these learneds do not work well even in simple RL tasks.
Agent-gradient distribution is non-independent and identically distributed, leading to inefficient meta-training.
We show that, although only trained in toy tasks, our learned can generalize unseen complex tasks in Brax.
arXiv Detail & Related papers (2023-02-03T00:11:02Z) - Reinforcement Learning from Diverse Human Preferences [68.4294547285359]
This paper develops a method for crowd-sourcing preference labels and learning from diverse human preferences.
The proposed method is tested on a variety of tasks in DMcontrol and Meta-world.
It has shown consistent and significant improvements over existing preference-based RL algorithms when learning from diverse feedback.
arXiv Detail & Related papers (2023-01-27T15:18:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.