SAMP-HDRL: Segmented Allocation with Momentum-Adjusted Utility for Multi-agent Portfolio Management via Hierarchical Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2512.22895v1
- Date: Sun, 28 Dec 2025 11:56:39 GMT
- Title: SAMP-HDRL: Segmented Allocation with Momentum-Adjusted Utility for Multi-agent Portfolio Management via Hierarchical Deep Reinforcement Learning
- Authors: Xiaotian Ren, Nuerxiati Abudurexiti, Zhengyong Jiang, Angelos Stefanidis, Hongbin Liu, Jionglong Su,
- Abstract summary: We propose a Segmented Allocation with Momentum-Adjusted Utility for Multi-agent Portfolio Management.<n>The framework integrates dynamic asset grouping to partition the market into high-quality and ordinary subsets.<n>Our method achieves at least 5% higher Return, 5% higher Sortino ratio, and 2% higher Omega ratio, with substantially larger gains.
- Score: 4.743963988265057
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Portfolio optimization in non-stationary markets is challenging due to regime shifts, dynamic correlations, and the limited interpretability of deep reinforcement learning (DRL) policies. We propose a Segmented Allocation with Momentum-Adjusted Utility for Multi-agent Portfolio Management via Hierarchical Deep Reinforcement Learning (SAMP-HDRL). The framework first applies dynamic asset grouping to partition the market into high-quality and ordinary subsets. An upper-level agent extracts global market signals, while lower-level agents perform intra-group allocation under mask constraints. A utility-based capital allocation mechanism integrates risky and risk-free assets, ensuring coherent coordination between global and local decisions. backtests across three market regimes (2019--2021) demonstrate that SAMP-HDRL consistently outperforms nine traditional baselines and nine DRL benchmarks under volatile and oscillating conditions. Compared with the strongest baseline, our method achieves at least 5\% higher Return, 5\% higher Sharpe ratio, 5\% higher Sortino ratio, and 2\% higher Omega ratio, with substantially larger gains observed in turbulent markets. Ablation studies confirm that upper--lower coordination, dynamic clustering, and capital allocation are indispensable to robustness. SHAP-based interpretability further reveals a complementary ``diversified + concentrated'' mechanism across agents, providing transparent insights into decision-making. Overall, SAMP-HDRL embeds structural market constraints directly into the DRL pipeline, offering improved adaptability, robustness, and interpretability in complex financial environments.
Related papers
- Heterogeneous Agent Collaborative Reinforcement Learning [52.99813668995983]
Heterogeneous Agent Collaborative Reinforcement Learning (HACRL)<n>Building on this paradigm, we propose HACPO, a collaborative RL algorithm that enables principled rollout sharing to maximize sample utilization and cross-agent knowledge transfer.<n>Experiments across diverse heterogeneous model combinations and reasoning benchmarks show that HACPO consistently improves all participating agents, outperforming GSPO by an average of 3.3% while using only half the rollout cost.
arXiv Detail & Related papers (2026-03-03T05:09:49Z) - Rethinking the Trust Region in LLM Reinforcement Learning [72.25890308541334]
Proximal Policy Optimization (PPO) serves as the de facto standard algorithm for Large Language Models (LLMs)<n>We propose Divergence Proximal Policy Optimization (DPPO), which substitutes clipping with a more principled constraint.<n>DPPO achieves superior training and efficiency compared to existing methods, offering a more robust foundation for RL-based fine-tuning.
arXiv Detail & Related papers (2026-02-04T18:59:04Z) - Stratified GRPO: Handling Structural Heterogeneity in Reinforcement Learning of LLM Search Agents [90.45197506653341]
Large language model (LLM) agents increasingly rely on external tools such as search engines to solve complex, multi-step problems.<n>The trajectories of search agents are structurally heterogeneous, where variations in the number, placement, and outcomes of search calls lead to fundamentally different answer directions and reward distributions.<n>Standard policy gradient methods, which use a single global baseline, suffer from what we identify and formalize as cross-stratum bias.<n>We propose Stratified GRPO, whose central component, Stratified Advantage Normalization (SAN), partitions trajectories into homogeneous strata based on their structural properties and computes advantages locally within each
arXiv Detail & Related papers (2025-10-07T17:59:13Z) - Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning [49.31650627835956]
Partial agent failure becomes inevitable when systems scale up, making it crucial to identify the subset of agents whose compromise would most severely degrade overall performance.<n>In this paper, we study this Vulnerable Agent Identification (VAI) problem in large-scale multi-agent reinforcement learning (MARL)<n> Experiments show our method effectively identifies more vulnerable agents in large-scale MARL and the rule-based system, fooling system into worse failures, and learning a value function that reveals the vulnerability of each agent.
arXiv Detail & Related papers (2025-09-18T16:03:50Z) - Multi-Agent Trust Region Policy Optimisation: A Joint Constraint Approach [17.48210470289556]
Heterogeneous-Agent Trust Region Policy Optimization (HATRPO) enforces per-agent trust region constraints using Kullback-Leibler (KL) divergence to stabilize training.<n> assigning each agent the same KL threshold can lead to slow and locally optimal updates, especially in heterogeneous settings.<n>We propose two approaches for allocating the KL divergence threshold across agents: HATRPO-W, a Karush-Kuhn-Tucker-based (KKT-based) method that optimize threshold assignment under global KL constraints, and HATRPO-G, a greedy algorithm that prioritizes agents based on improvement-to
arXiv Detail & Related papers (2025-08-14T04:48:46Z) - MARS: A Meta-Adaptive Reinforcement Learning Framework for Risk-Aware Multi-Agent Portfolio Management [7.740995234462868]
Reinforcement Learning has shown significant promise in automated portfolio management.<n>We propose Meta-controlled Agents for a Risk-aware System (MARS)<n>MARS employs a Heterogeneous Agent Ensemble where each agent possesses a unique, intrinsic risk profile.
arXiv Detail & Related papers (2025-08-02T03:23:41Z) - DISCO Balances the Scales: Adaptive Domain- and Difficulty-Aware Reinforcement Learning on Imbalanced Data [65.09939942413651]
We propose a principled extension to GRPO that addresses inter-group imbalance with two key innovations.<n> Domain-aware reward scaling counteracts frequency bias by reweighting optimization based on domain prevalence.<n>Difficulty-aware reward scaling leverages prompt-level self-consistency to identify and prioritize uncertain prompts that offer greater learning value.
arXiv Detail & Related papers (2025-05-21T03:43:29Z) - Offline Multi-agent Reinforcement Learning via Score Decomposition [51.23590397383217]
offline cooperative multi-agent reinforcement learning (MARL) faces unique challenges due to distributional shifts.<n>This work is the first work to explicitly address the distributional gap between offline and online MARL.
arXiv Detail & Related papers (2025-05-09T11:42:31Z) - Hierarchical Reinforced Trader (HRT): A Bi-Level Approach for Optimizing Stock Selection and Execution [0.9553307596675155]
We introduce the Hierarchical Reinforced Trader (HRT), a novel trading strategy employing a bi-level Hierarchical Reinforcement Learning framework.
HRT integrates a Proximal Policy Optimization (PPO)-based High-Level Controller (HLC) for strategic stock selection with a Deep Deterministic Policy Gradient (DDPG)-based Low-Level Controller (LLC) tasked with optimizing trade executions to enhance portfolio value.
arXiv Detail & Related papers (2024-10-19T01:29:38Z) - Optimizing Portfolio with Two-Sided Transactions and Lending: A Reinforcement Learning Framework [0.0]
This study presents a Reinforcement Learning-based portfolio management model tailored for high-risk environments.
We implement the model using the Soft Actor-Critic (SAC) agent with a Convolutional Neural Network with Multi-Head Attention.
Tested over two 16-month periods of varying market volatility, the model significantly outperformed benchmarks.
arXiv Detail & Related papers (2024-08-09T23:36:58Z) - Developing A Multi-Agent and Self-Adaptive Framework with Deep Reinforcement Learning for Dynamic Portfolio Risk Management [1.2016264781280588]
A multi-agent reinforcement learning (RL) approach is proposed to balance the trade-off between the overall portfolio returns and their potential risks.
The obtained empirical results clearly reveal the potential strengths of our proposed MASA framework.
arXiv Detail & Related papers (2024-02-01T11:31:26Z) - Monotonic Improvement Guarantees under Non-stationarity for
Decentralized PPO [66.5384483339413]
We present a new monotonic improvement guarantee for optimizing decentralized policies in cooperative Multi-Agent Reinforcement Learning (MARL)
We show that a trust region constraint can be effectively enforced in a principled way by bounding independent ratios based on the number of agents in training.
arXiv Detail & Related papers (2022-01-31T20:39:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.