Balancing Profit, Risk, and Sustainability for Portfolio Management
- URL: http://arxiv.org/abs/2207.02134v1
- Date: Mon, 6 Jun 2022 08:38:30 GMT
- Title: Balancing Profit, Risk, and Sustainability for Portfolio Management
- Authors: Charl Maree and Christian W. Omlin
- Abstract summary: We develop a novel utility function with the Sharpe ratio representing risk and the environmental, social, and governance score (ESG) representing sustainability.
We show that our system outperforms MADDPG while improving on deep Q-learning approaches by allowing for continuous action spaces.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stock portfolio optimization is the process of continuous reallocation of
funds to a selection of stocks. This is a particularly well-suited problem for
reinforcement learning, as daily rewards are compounding and objective
functions may include more than just profit, e.g., risk and sustainability. We
developed a novel utility function with the Sharpe ratio representing risk and
the environmental, social, and governance score (ESG) representing
sustainability. We show that a state-of-the-art policy gradient method -
multi-agent deep deterministic policy gradients (MADDPG) - fails to find the
optimum policy due to flat policy gradients and we therefore replaced gradient
descent with a genetic algorithm for parameter optimization. We show that our
system outperforms MADDPG while improving on deep Q-learning approaches by
allowing for continuous action spaces. Crucially, by incorporating risk and
sustainability criteria in the utility function, we improve on the
state-of-the-art in reinforcement learning for portfolio optimization; risk and
sustainability are essential in any modern trading strategy and we propose a
system that does not merely report these metrics, but that actively optimizes
the portfolio to improve on them.
Related papers
- Deep Reinforcement Learning and Mean-Variance Strategies for Responsible Portfolio Optimization [49.396692286192206]
We study the use of deep reinforcement learning for responsible portfolio optimization by incorporating ESG states and objectives.
Our results show that deep reinforcement learning policies can provide competitive performance against mean-variance approaches for responsible portfolio allocation.
arXiv Detail & Related papers (2024-03-25T12:04:03Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Robust Risk-Aware Reinforcement Learning [0.0]
We present a reinforcement learning (RL) approach for robust optimisation of risk-aware performance criteria.
We assess the value of a policy using rank dependent expected utility (RDEU)
To robustify optimal policies against model uncertainty, we assess a policy not by its distribution, but by the worst possible distribution that lies within a Wasserstein ball around it.
arXiv Detail & Related papers (2021-08-23T20:56:34Z) - Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk.
Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z) - On the Convergence and Optimality of Policy Gradient for Markov Coherent
Risk [32.97618081988295]
We present a tight upper bound on the suboptimality of the learned policy, characterizing its dependence on the nonlinearity of the objective and the degree of risk aversion.
We propose a practical implementation of PG that uses state distribution reweighting to overcome previous limitations.
arXiv Detail & Related papers (2021-03-04T04:11:09Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z) - Time your hedge with Deep Reinforcement Learning [0.0]
Deep Reinforcement Learning (DRL) can tackle this challenge by creating a dynamic dependency between market information and hedging strategies allocation decisions.
We present a realistic and augmented DRL framework that: (i) uses additional contextual information to decide an action, (ii) has a one period lag between observations and actions to account for one day lag turnover of common asset managers to rebalance their hedge, (iii) is fully tested in terms of stability and robustness thanks to a repetitive train test method called anchored walk forward training, similar in spirit to k fold cross validation for time series and (iv) allows managing leverage of our hedging
arXiv Detail & Related papers (2020-09-16T06:43:41Z) - Stable Policy Optimization via Off-Policy Divergence Regularization [50.98542111236381]
Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) are among the most successful policy gradient approaches in deep reinforcement learning (RL)
We propose a new algorithm which stabilizes the policy improvement through a proximity term that constrains the discounted state-action visitation distribution induced by consecutive policies to be close to one another.
Our proposed method can have a beneficial effect on stability and improve final performance in benchmark high-dimensional control tasks.
arXiv Detail & Related papers (2020-03-09T13:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.