Related papers: Bridging the gap between Markowitz planning and deep reinforcement learning

Bridging the gap between Markowitz planning and deep reinforcement learning

URL: http://arxiv.org/abs/2010.09108v1
Date: Wed, 30 Sep 2020 04:03:27 GMT
Title: Bridging the gap between Markowitz planning and deep reinforcement learning
Authors: Eric Benhamou, David Saltiel, Sandrine Ungari, Abhishek Mukhopadhyay
Abstract summary: This paper shows how Deep Reinforcement Learning techniques can shed new lights on portfolio allocation. The advantages are numerous: (i) DRL maps directly market conditions to actions by design and hence should adapt to changing environment, (ii) DRL does not rely on any traditional financial risk assumptions like that risk is represented by variance, (iii) DRL can incorporate additional data and be a multi inputs method as opposed to more traditional optimization methods.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While researchers in the asset management industry have mostly focused on techniques based on financial and risk planning techniques like Markowitz efficient frontier, minimum variance, maximum diversification or equal risk parity, in parallel, another community in machine learning has started working on reinforcement learning and more particularly deep reinforcement learning to solve other decision making problems for challenging task like autonomous driving, robot learning, and on a more conceptual side games solving like Go. This paper aims to bridge the gap between these two approaches by showing Deep Reinforcement Learning (DRL) techniques can shed new lights on portfolio allocation thanks to a more general optimization setting that casts portfolio allocation as an optimal control problem that is not just a one-step optimization, but rather a continuous control optimization with a delayed reward. The advantages are numerous: (i) DRL maps directly market conditions to actions by design and hence should adapt to changing environment, (ii) DRL does not rely on any traditional financial risk assumptions like that risk is represented by variance, (iii) DRL can incorporate additional data and be a multi inputs method as opposed to more traditional optimization methods. We present on an experiment some encouraging results using convolution networks.

Related papers

Your Offline Policy is Not Trustworthy: Bilevel Reinforcement Learning for Sequential Portfolio Optimization [82.03139922490796]
Reinforcement learning (RL) has shown significant promise for sequential portfolio optimization tasks, such as stock trading, where the objective is to maximize cumulative returns while minimizing risks using historical data.<n>Traditional RL approaches often produce policies that merely memorize the optimal yet impractical buying and selling behaviors within the fixed dataset.<n>Our approach frames portfolio optimization as a new type of partial-offline RL problem and makes two technical contributions.
arXiv Detail & Related papers (2025-05-19T06:37:25Z)
A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning [4.495144308458951]
We find that training the DRL agent using the actor-critic algorithm and deep function approximators may lead to scenarios where the improvement in the DRL agent's risk-adjusted profitability is not significant. We propose a novel multi-agent Deep Reinforcement Learning (L) algorithmic framework in this research.
arXiv Detail & Related papers (2025-01-12T15:00:02Z)
Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization [22.67700436936984]
We introduce Direct Advantage Policy Optimization (DAPO), a novel step-level offline reinforcement learning algorithm. DAPO employs a critic function to predict the reasoning accuracy at each step, thereby generating dense signals to refine the generation strategy. Our results show that DAPO can effectively enhance the mathematical and code capabilities on both SFT models and RL models, demonstrating the effectiveness of DAPO.
arXiv Detail & Related papers (2024-12-24T08:39:35Z)
Learning for Cross-Layer Resource Allocation in MEC-Aided Cell-Free Networks [71.30914500714262]
Cross-layer resource allocation over mobile edge computing (MEC)-aided cell-free networks can sufficiently exploit the transmitting and computing resources to promote the data rate. Joint subcarrier allocation and beamforming optimization are investigated for the MEC-aided cell-free network from the perspective of deep learning.
arXiv Detail & Related papers (2024-12-21T10:18:55Z)
MILLION: A General Multi-Objective Framework with Controllable Risk for Portfolio Management [16.797109778036862]
We propose a general Multi-objectIve framework with controLLableIsk for pOrtfolio maMILLION. In the risk control phase, we propose two methods, i.e., portfolio adaption and portfolio improvement. The results demonstrate the effectiveness and efficiency of the proposed framework.
arXiv Detail & Related papers (2024-12-04T05:19:34Z)
Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning [67.95280175998792]
A novel adversarial imitation learning (GAIL)-powered policy learning approach is proposed for optimizing beamforming, spectrum allocation, and remote user equipment (RUE) association ins. We employ inverse RL (IRL) to automatically learn reward functions without manual tuning. We show that the proposed MA-AL method outperforms traditional RL approaches, achieving a $14.6%$ improvement in convergence and reward value.
arXiv Detail & Related papers (2024-09-27T13:05:02Z)
To Switch or Not to Switch? Balanced Policy Switching in Offline Reinforcement Learning [2.951820152291149]
In several decision problems, one faces the possibility of policy switching, which incurs a non-negligible cost. We propose a novel strategy for balancing between the gain and the cost of switching in a flexible and principled way. We establish fundamental properties and design a Net Actor-Critic algorithm for the proposed novel switching formulation.
arXiv Detail & Related papers (2024-07-01T22:24:31Z)
Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy. The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms. We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z)
Deep Reinforcement Learning and Mean-Variance Strategies for Responsible Portfolio Optimization [49.396692286192206]
We study the use of deep reinforcement learning for responsible portfolio optimization by incorporating ESG states and objectives. Our results show that deep reinforcement learning policies can provide competitive performance against mean-variance approaches for responsible portfolio allocation.
arXiv Detail & Related papers (2024-03-25T12:04:03Z)
Learning Constrained Optimization with Deep Augmented Lagrangian Methods [54.22290715244502]
A machine learning (ML) model is trained to emulate a constrained optimization solver. This paper proposes an alternative approach, in which the ML model is trained to predict dual solution estimates directly. It enables an end-to-end training scheme is which the dual objective is as a loss function, and solution estimates toward primal feasibility, emulating a Dual Ascent method.
arXiv Detail & Related papers (2024-03-06T04:43:22Z)
A Learnheuristic Approach to A Constrained Multi-Objective Portfolio Optimisation Problem [0.0]
This paper studies multi-objective portfolio optimisation. It aims to achieve the objective of maximising the expected return while minimising the risk of a given rate of return.
arXiv Detail & Related papers (2023-04-13T17:05:45Z)
Learning to Optimize for Reinforcement Learning [58.01132862590378]
Reinforcement learning (RL) is essentially different from supervised learning, and in practice, these learneds do not work well even in simple RL tasks. Agent-gradient distribution is non-independent and identically distributed, leading to inefficient meta-training. We show that, although only trained in toy tasks, our learned can generalize unseen complex tasks in Brax.
arXiv Detail & Related papers (2023-02-03T00:11:02Z)
Reinforcement Learning from Diverse Human Preferences [68.4294547285359]
This paper develops a method for crowd-sourcing preference labels and learning from diverse human preferences. The proposed method is tested on a variety of tasks in DMcontrol and Meta-world. It has shown consistent and significant improvements over existing preference-based RL algorithms when learning from diverse feedback.
arXiv Detail & Related papers (2023-01-27T15:18:54Z)
Multi-fidelity reinforcement learning framework for shape optimization [0.8258451067861933]
We introduce a controlled transfer learning framework that leverages a multi-fidelity simulation setting. Our strategy is deployed for an airfoil shape optimization problem at high Reynolds numbers. Our results demonstrate this framework's applicability to other scientific DRL scenarios.
arXiv Detail & Related papers (2022-02-22T20:44:04Z)
Deep Reinforcement Learning and Convex Mean-Variance Optimisation for Portfolio Management [0.0]
Reinforcement learning (RL) methods do not rely on explicit forecasts and are better suited for multi-stage decision processes. Experiments were conducted on three markets in different economies with different overall trends.
arXiv Detail & Related papers (2022-02-13T10:12:09Z)
Deep Risk Model: A Deep Learning Solution for Mining Latent Risk Factors to Improve Covariance Matrix Estimation [8.617532047238461]
We propose a deep learning solution to effectively "design" risk factors with neural networks. Our method can obtain $1.9%$ higher explained variance measured by $R2$ and also reduce the risk of a global minimum variance portfolio.
arXiv Detail & Related papers (2021-07-12T05:30:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.