Hybrid LSTM and PPO Networks for Dynamic Portfolio Optimization
- URL: http://arxiv.org/abs/2511.17963v1
- Date: Sat, 22 Nov 2025 07:57:03 GMT
- Title: Hybrid LSTM and PPO Networks for Dynamic Portfolio Optimization
- Authors: Jun Kevin, Pujianto Yugopuspito,
- Abstract summary: This paper introduces a hybrid framework for portfolio optimization that fuses Long Short-Term Memory (LSTM) forecasting with a Proximal Policy Optimization (PPO) reinforcement learning strategy.<n>The proposed system leverages the predictive power of deep recurrent networks to capture temporal dependencies, while the PPO agent adaptively refines portfolio allocations in continuous action spaces.<n>The framework's performance is benchmarked against equal-weighted, index-based, and single-model approaches (LSTM-only and PPO-only) using annualized return, volatility, Sharpe ratio, and maximum drawdown metrics.
- Score: 0.05475997486212839
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces a hybrid framework for portfolio optimization that fuses Long Short-Term Memory (LSTM) forecasting with a Proximal Policy Optimization (PPO) reinforcement learning strategy. The proposed system leverages the predictive power of deep recurrent networks to capture temporal dependencies, while the PPO agent adaptively refines portfolio allocations in continuous action spaces, allowing the system to anticipate trends while adjusting dynamically to market shifts. Using multi-asset datasets covering U.S. and Indonesian equities, U.S. Treasuries, and major cryptocurrencies from January 2018 to December 2024, the model is evaluated against several baselines, including equal-weight, index-style, and single-model variants (LSTM-only and PPO-only). The framework's performance is benchmarked against equal-weighted, index-based, and single-model approaches (LSTM-only and PPO-only) using annualized return, volatility, Sharpe ratio, and maximum drawdown metrics, each adjusted for transaction costs. The results indicate that the hybrid architecture delivers higher returns and stronger resilience under non-stationary market regimes, suggesting its promise as a robust, AI-driven framework for dynamic portfolio optimization.
Related papers
- Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization [60.87651283510059]
Group Relative Policy Optimization (GRPO) effectively scales LLM reasoning but incurs prohibitive computational costs.<n>We propose Dynamic Pruning Policy Optimization (DPPO), a framework that enables dynamic pruning while preserving unbiased gradient estimation.<n>To mitigate the data sparsity induced by pruning, we introduce Dense Prompt Packing, a window-based greedy strategy.
arXiv Detail & Related papers (2026-03-04T14:48:53Z) - Rethinking the Trust Region in LLM Reinforcement Learning [72.25890308541334]
Proximal Policy Optimization (PPO) serves as the de facto standard algorithm for Large Language Models (LLMs)<n>We propose Divergence Proximal Policy Optimization (DPPO), which substitutes clipping with a more principled constraint.<n>DPPO achieves superior training and efficiency compared to existing methods, offering a more robust foundation for RL-based fine-tuning.
arXiv Detail & Related papers (2026-02-04T18:59:04Z) - A Novel approach to portfolio construction [0.0]
This paper proposes a machine learning-based framework for asset selection and portfolio construction.<n>It is called the Best-Path Algorithm Sparse Graphical Model (BPASGM)<n>Monte Carlo simulations show BPASGM-based portfolios achieve more stable risk-return profiles, lower realized volatility, and superior risk-adjusted performance.
arXiv Detail & Related papers (2026-02-03T09:52:06Z) - MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization [56.074760766965085]
Group-Relative Policy Optimization has emerged as an efficient paradigm for aligning Large Language Models (LLMs)<n>We propose MAESTRO, which treats reward scalarization as a dynamic latent policy, leveraging the model's terminal hidden states as a semantic bottleneck.<n>We formulate this as a contextual bandit problem within a bi-level optimization framework, where a lightweight Conductor network co-evolves with the policy by utilizing group-relative advantages as a meta-reward signal.
arXiv Detail & Related papers (2026-01-12T05:02:48Z) - Integrated Prediction and Multi-period Portfolio Optimization [29.582959310549594]
Multi-period portfolio optimization accounts for transaction costs, path-dependent risks, and the intertemporal structure of trading decisions.<n>This paper introduces IPMO, a model for multi-period mean-variance portfolio optimization with turnover penalties.<n>For scalability, we introduce a mirror-descent fixed-point (MDFP) differentiation scheme that avoids factorizing the Karush-Kuhn-Tucker (KKT) systems.
arXiv Detail & Related papers (2025-12-12T04:31:22Z) - BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping [69.74252624161652]
We propose BAlanced Policy Optimization with Adaptive Clipping (BAPO)<n>BAPO dynamically adjusts clipping bounds to adaptively re-balance positive and negative contributions, preserve entropy, and stabilize RL optimization.<n>On AIME 2024 and AIME 2025 benchmarks, our 7B BAPO model surpasses open-source counterparts such as SkyWork-OR1-7B.
arXiv Detail & Related papers (2025-10-21T12:55:04Z) - From Headlines to Holdings: Deep Learning for Smarter Portfolio Decisions [4.288926547930663]
We present an end-to-end framework that learns portfolio weights using deep learning.<n>We evaluate the framework on nine U.S. stocks spanning six sectors, chosen to balance sector diversity and news coverage.<n>Although the stock universe is limited, the results underscore the value of integrating price, relational, and sentiment signals for portfolio management.
arXiv Detail & Related papers (2025-09-29T00:42:24Z) - Dependency Network-Based Portfolio Design with Forecasting and VaR Constraints [8.107171581224312]
This study proposes a novel portfolio optimization framework that integrates statistical social network analysis with time series forecasting and risk management.<n>Using daily stock data from the S&P 500 ( 2020-2024), we construct dependency networks via Vector Autoregression ( VAR) and Forecast Error Variance Decomposition (FEVD)<n>FEVD breaks down the VAR's forecast error variance to quantify how much each stock's shocks contribute to another's uncertainty information we invert to form influence-based edge weights in our network.<n>A dynamic portfolio is constructed using the top-ranked stocks, with capital allocated based on Value at Risk (
arXiv Detail & Related papers (2025-07-26T18:53:39Z) - Deep Learning Enhanced Multivariate GARCH [7.475786051454157]
Long Short-Term Memory enhanced BEKK (LSTM-BEKK) integrates deep learning into multivariate GARCH processes.<n>Our approach is designed to better capture nonlinear, dynamic, and high-dimensional dependence structures in financial return data.<n> Empirical results across multiple equity markets confirm that the LSTM-BEKK model achieves superior performance in terms of out-of-sample portfolio risk forecast.
arXiv Detail & Related papers (2025-06-03T12:22:57Z) - Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis [89.60263788590893]
Post-training Quantization (PTQ) technique has been extensively adopted for large language models (LLMs) compression.<n>Existing algorithms focus primarily on performance, overlooking the trade-off among model size, performance, and quantization bitwidth.<n>We provide a novel benchmark for LLMs PTQ in this paper.
arXiv Detail & Related papers (2025-02-18T07:35:35Z) - Regret-Optimized Portfolio Enhancement through Deep Reinforcement Learning and Future Looking Rewards [3.9795751586546766]
This paper introduces a novel agent-based approach for enhancing existing portfolio strategies using Proximal Policy Optimization (PPO)<n>Rather than focusing solely on traditional portfolio construction, our approach aims to improve an already high-performing strategy through dynamic rebalancing driven by PPO and Oracle agents.
arXiv Detail & Related papers (2025-02-04T11:45:59Z) - BreakGPT: Leveraging Large Language Models for Predicting Asset Price Surges [55.2480439325792]
This paper introduces BreakGPT, a novel large language model (LLM) architecture adapted specifically for time series forecasting and the prediction of sharp upward movements in asset prices.
We showcase BreakGPT as a promising solution for financial forecasting with minimal training and as a strong competitor for capturing both local and global temporal dependencies.
arXiv Detail & Related papers (2024-11-09T05:40:32Z) - AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization [45.46582930202524]
$alpha$-DPO is an adaptive preference optimization algorithm for large language models.<n>It balances the policy model and the reference model to achieve personalized reward margins.<n>It consistently outperforms DPO and SimPO across various model settings.
arXiv Detail & Related papers (2024-10-14T04:29:57Z) - Long Short-Term Memory Neural Network for Financial Time Series [0.0]
We present an ensemble of independent and parallel long short-term memory neural networks for the prediction of stock price movement.
With a straightforward trading strategy, comparisons with a randomly chosen portfolio and a portfolio containing all the stocks in the index show that the portfolio resulting from the LSTM ensemble provides better average daily returns and higher cumulative returns over time.
arXiv Detail & Related papers (2022-01-20T15:17:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.