Related papers: Deep Reinforcement Learning for Optimal Portfolio Allocation: A Comparative Study with Mean-Variance Optimization

Deep Reinforcement Learning for Optimal Portfolio Allocation: A Comparative Study with Mean-Variance Optimization

URL: http://arxiv.org/abs/2602.17098v1
Date: Thu, 19 Feb 2026 05:47:23 GMT
Title: Deep Reinforcement Learning for Optimal Portfolio Allocation: A Comparative Study with Mean-Variance Optimization
Authors: Srijan Sood, Kassiani Papasotiriou, Marius Vaiciulis, Tucker Balch,
Abstract summary: Deep Reinforcement Learning (DRL) has shown promising results when used to optimize portfolio allocation by training model-free agents on historical market data.<n>Our work is a thorough comparison between model-free DRL and Mean-Variance Portfolio Optimization (MVO) for optimal portfolio allocation.<n>Backtest results demonstrate strong performance of the DRL agent across many metrics, including Sharpe ratio, maximum drawdowns, and absolute returns.
Score: 4.433030281282368
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Portfolio Management is the process of overseeing a group of investments, referred to as a portfolio, with the objective of achieving predetermined investment goals. Portfolio optimization is a key component that involves allocating the portfolio assets so as to maximize returns while minimizing risk taken. It is typically carried out by financial professionals who use a combination of quantitative techniques and investment expertise to make decisions about the portfolio allocation. Recent applications of Deep Reinforcement Learning (DRL) have shown promising results when used to optimize portfolio allocation by training model-free agents on historical market data. Many of these methods compare their results against basic benchmarks or other state-of-the-art DRL agents but often fail to compare their performance against traditional methods used by financial professionals in practical settings. One of the most commonly used methods for this task is Mean-Variance Portfolio Optimization (MVO), which uses historical time series information to estimate expected asset returns and covariances, which are then used to optimize for an investment objective. Our work is a thorough comparison between model-free DRL and MVO for optimal portfolio allocation. We detail the specifics of how to make DRL for portfolio optimization work in practice, also noting the adjustments needed for MVO. Backtest results demonstrate strong performance of the DRL agent across many metrics, including Sharpe ratio, maximum drawdowns, and absolute returns.

Related papers

$φ$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models [58.217707070069885]
This paper presents a novel Fairness Direct Preference Optimization (FaiDPO or $$-DPO) framework for continual learning in LMMs.<n>We first propose a new continual learning paradigm based on Direct Preference Optimization (DPO) to mitigate catastrophic forgetting by aligning learning with pairwise preference signals.<n> Extensive experiments and ablation studies show the proposed $$-DPO achieves State-of-the-Art performance across multiple benchmarks.
arXiv Detail & Related papers (2026-02-26T04:14:33Z)
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization [133.27496265096445]
We show how to apply Group Relative Policy Optimization under multi-reward setting without examining its suitability.<n>We then introduce Group reward-Decoupled Normalization Policy Optimization (GDPO), a new policy optimization method to resolve these issues.<n>GDPO consistently outperforms GRPO, demonstrating its effectiveness and generalizability for multi-reward reinforcement learning optimization.
arXiv Detail & Related papers (2026-01-08T18:59:24Z)
Generative AI-enhanced Sector-based Investment Portfolio Construction [12.174346896225153]
This paper investigates how Large Language Models (LLMs) can be applied to quantitative sector-based portfolio construction.<n>We use LLMs to identify investable universes of stocks within S&P 500 sector indices.<n>We evaluate how their selections perform when combined with classical portfolio optimization methods.
arXiv Detail & Related papers (2025-12-31T00:19:41Z)
Accelerating RL for LLM Reasoning with Optimal Advantage Regression [52.0792918455501]
We propose a novel two-stage policy optimization framework that directly approximates the optimal advantage function.<n>$A$*-PO achieves competitive performance across a wide range of mathematical reasoning benchmarks.<n>It reduces training time by up to 2$times$ and peak memory usage by over 30% compared to PPO, GRPO, and REBEL.
arXiv Detail & Related papers (2025-05-27T03:58:50Z)
Your Offline Policy is Not Trustworthy: Bilevel Reinforcement Learning for Sequential Portfolio Optimization [82.03139922490796]
Reinforcement learning (RL) has shown significant promise for sequential portfolio optimization tasks, such as stock trading, where the objective is to maximize cumulative returns while minimizing risks using historical data.<n>Traditional RL approaches often produce policies that merely memorize the optimal yet impractical buying and selling behaviors within the fixed dataset.<n>Our approach frames portfolio optimization as a new type of partial-offline RL problem and makes two technical contributions.
arXiv Detail & Related papers (2025-05-19T06:37:25Z)
Deep Reinforcement Learning for Investor-Specific Portfolio Optimization: A Volatility-Guided Asset Selection Approach [2.2835610890984164]
This study proposes a volatility-guided portfolio optimization framework that dynamically constructs portfolios based on investors' risk profiles.<n>The efficacy of the proposed methodology is established using stocks from the Dow $30$ index.
arXiv Detail & Related papers (2025-04-20T10:17:37Z)
Optimizing Portfolio Performance through Clustering and Sharpe Ratio-Based Optimization: A Comparative Backtesting Approach [0.0]
This paper introduces a comparative backtesting approach that combines clustering-based portfolio segmentation and Sharpe ratio-based optimization to enhance investment decision-making.<n>We segment a diverse set of financial assets into clusters based on their historical log-returns using K-Means clustering.<n>For each cluster, we apply a Sharpe ratio-based optimization model to derive optimal weights that maximize risk-adjusted returns.
arXiv Detail & Related papers (2025-01-21T12:00:52Z)
VinePPO: Refining Credit Assignment in RL Training of LLMs [66.80143024475635]
We propose VinePPO, a straightforward approach that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates.<n>Our method consistently outperforms PPO and other baselines across MATH and GSM8K datasets in less wall-clock time.
arXiv Detail & Related papers (2024-10-02T15:49:30Z)
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process. We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals. The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z)
Deep Reinforcement Learning and Mean-Variance Strategies for Responsible Portfolio Optimization [49.396692286192206]
We study the use of deep reinforcement learning for responsible portfolio optimization by incorporating ESG states and objectives. Our results show that deep reinforcement learning policies can provide competitive performance against mean-variance approaches for responsible portfolio allocation.
arXiv Detail & Related papers (2024-03-25T12:04:03Z)
Combining Transformer based Deep Reinforcement Learning with Black-Litterman Model for Portfolio Optimization [0.0]
As a model-free algorithm, deep reinforcement learning (DRL) agent learns and makes decisions by interacting with the environment in an unsupervised way. We propose a hybrid portfolio optimization model combining the DRL agent and the Black-Litterman (BL) model. Our DRL agent significantly outperforms various comparison portfolio choice strategies and alternative DRL frameworks by at least 42% in terms of accumulated return.
arXiv Detail & Related papers (2024-02-23T16:01:37Z)
Mean-Variance Efficient Collaborative Filtering for Stock Recommendation [33.04652311923051]
Traditional investment recommendations focus on high-return stocks or highly diversified portfolios, often neglecting user preferences.<n>We propose mean-variance efficient collaborative filtering (MVECF) to optimally blend user's preference with the portfolio theory.
arXiv Detail & Related papers (2023-06-11T04:51:29Z)
Deep Reinforcement Learning for Long-Short Portfolio Optimization [7.131902599861306]
This paper constructs a Deep Reinforcement Learning (DRL) portfolio management framework with short-selling mechanisms conforming to actual trading rules.<n>Key innovations include development of a comprehensive short-selling mechanism in continuous trading that accounts for dynamic evolution of transactions across time periods.<n>Compared to traditional approaches, this model delivers superior risk-adjusted returns while reducing maximum drawdown.
arXiv Detail & Related papers (2020-12-26T16:25:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.