Related papers: SAMP-HDRL: Segmented Allocation with Momentum-Adjusted Utility for Multi-agent Portfolio Management via Hierarchical Deep Reinforcement Learning

SAMP-HDRL: Segmented Allocation with Momentum-Adjusted Utility for Multi-agent Portfolio Management via Hierarchical Deep Reinforcement Learning

URL: http://arxiv.org/abs/2512.22895v1
Date: Sun, 28 Dec 2025 11:56:39 GMT
Title: SAMP-HDRL: Segmented Allocation with Momentum-Adjusted Utility for Multi-agent Portfolio Management via Hierarchical Deep Reinforcement Learning
Authors: Xiaotian Ren, Nuerxiati Abudurexiti, Zhengyong Jiang, Angelos Stefanidis, Hongbin Liu, Jionglong Su,
Abstract summary: We propose a Segmented Allocation with Momentum-Adjusted Utility for Multi-agent Portfolio Management.<n>The framework integrates dynamic asset grouping to partition the market into high-quality and ordinary subsets.<n>Our method achieves at least 5% higher Return, 5% higher Sortino ratio, and 2% higher Omega ratio, with substantially larger gains.
Score: 4.743963988265057
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Portfolio optimization in non-stationary markets is challenging due to regime shifts, dynamic correlations, and the limited interpretability of deep reinforcement learning (DRL) policies. We propose a Segmented Allocation with Momentum-Adjusted Utility for Multi-agent Portfolio Management via Hierarchical Deep Reinforcement Learning (SAMP-HDRL). The framework first applies dynamic asset grouping to partition the market into high-quality and ordinary subsets. An upper-level agent extracts global market signals, while lower-level agents perform intra-group allocation under mask constraints. A utility-based capital allocation mechanism integrates risky and risk-free assets, ensuring coherent coordination between global and local decisions. backtests across three market regimes (2019--2021) demonstrate that SAMP-HDRL consistently outperforms nine traditional baselines and nine DRL benchmarks under volatile and oscillating conditions. Compared with the strongest baseline, our method achieves at least 5\% higher Return, 5\% higher Sharpe ratio, 5\% higher Sortino ratio, and 2\% higher Omega ratio, with substantially larger gains observed in turbulent markets. Ablation studies confirm that upper--lower coordination, dynamic clustering, and capital allocation are indispensable to robustness. SHAP-based interpretability further reveals a complementary ``diversified + concentrated'' mechanism across agents, providing transparent insights into decision-making. Overall, SAMP-HDRL embeds structural market constraints directly into the DRL pipeline, offering improved adaptability, robustness, and interpretability in complex financial environments.

Related papers

Heterogeneous Agent Collaborative Reinforcement Learning [52.99813668995983]
Heterogeneous Agent Collaborative Reinforcement Learning (HACRL)<n>Building on this paradigm, we propose HACPO, a collaborative RL algorithm that enables principled rollout sharing to maximize sample utilization and cross-agent knowledge transfer.<n>Experiments across diverse heterogeneous model combinations and reasoning benchmarks show that HACPO consistently improves all participating agents, outperforming GSPO by an average of 3.3% while using only half the rollout cost.
arXiv Detail & Related papers (2026-03-03T05:09:49Z)
Rethinking the Trust Region in LLM Reinforcement Learning [72.25890308541334]
Proximal Policy Optimization (PPO) serves as the de facto standard algorithm for Large Language Models (LLMs)<n>We propose Divergence Proximal Policy Optimization (DPPO), which substitutes clipping with a more principled constraint.<n>DPPO achieves superior training and efficiency compared to existing methods, offering a more robust foundation for RL-based fine-tuning.
arXiv Detail & Related papers (2026-02-04T18:59:04Z)
Stratified GRPO: Handling Structural Heterogeneity in Reinforcement Learning of LLM Search Agents [90.45197506653341]
Large language model (LLM) agents increasingly rely on external tools such as search engines to solve complex, multi-step problems.<n>The trajectories of search agents are structurally heterogeneous, where variations in the number, placement, and outcomes of search calls lead to fundamentally different answer directions and reward distributions.<n>Standard policy gradient methods, which use a single global baseline, suffer from what we identify and formalize as cross-stratum bias.<n>We propose Stratified GRPO, whose central component, Stratified Advantage Normalization (SAN), partitions trajectories into homogeneous strata based on their structural properties and computes advantages locally within each
arXiv Detail & Related papers (2025-10-07T17:59:13Z)
Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning [49.31650627835956]
Partial agent failure becomes inevitable when systems scale up, making it crucial to identify the subset of agents whose compromise would most severely degrade overall performance.<n>In this paper, we study this Vulnerable Agent Identification (VAI) problem in large-scale multi-agent reinforcement learning (MARL)<n> Experiments show our method effectively identifies more vulnerable agents in large-scale MARL and the rule-based system, fooling system into worse failures, and learning a value function that reveals the vulnerability of each agent.
arXiv Detail & Related papers (2025-09-18T16:03:50Z)
Multi-Agent Trust Region Policy Optimisation: A Joint Constraint Approach [17.48210470289556]
Heterogeneous-Agent Trust Region Policy Optimization (HATRPO) enforces per-agent trust region constraints using Kullback-Leibler (KL) divergence to stabilize training.<n> assigning each agent the same KL threshold can lead to slow and locally optimal updates, especially in heterogeneous settings.<n>We propose two approaches for allocating the KL divergence threshold across agents: HATRPO-W, a Karush-Kuhn-Tucker-based (KKT-based) method that optimize threshold assignment under global KL constraints, and HATRPO-G, a greedy algorithm that prioritizes agents based on improvement-to
arXiv Detail & Related papers (2025-08-14T04:48:46Z)
MARS: A Meta-Adaptive Reinforcement Learning Framework for Risk-Aware Multi-Agent Portfolio Management [7.740995234462868]
Reinforcement Learning has shown significant promise in automated portfolio management.<n>We propose Meta-controlled Agents for a Risk-aware System (MARS)<n>MARS employs a Heterogeneous Agent Ensemble where each agent possesses a unique, intrinsic risk profile.
arXiv Detail & Related papers (2025-08-02T03:23:41Z)
DISCO Balances the Scales: Adaptive Domain- and Difficulty-Aware Reinforcement Learning on Imbalanced Data [65.09939942413651]
We propose a principled extension to GRPO that addresses inter-group imbalance with two key innovations.<n> Domain-aware reward scaling counteracts frequency bias by reweighting optimization based on domain prevalence.<n>Difficulty-aware reward scaling leverages prompt-level self-consistency to identify and prioritize uncertain prompts that offer greater learning value.
arXiv Detail & Related papers (2025-05-21T03:43:29Z)
Offline Multi-agent Reinforcement Learning via Score Decomposition [51.23590397383217]
offline cooperative multi-agent reinforcement learning (MARL) faces unique challenges due to distributional shifts.<n>This work is the first work to explicitly address the distributional gap between offline and online MARL.
arXiv Detail & Related papers (2025-05-09T11:42:31Z)
Hierarchical Reinforced Trader (HRT): A Bi-Level Approach for Optimizing Stock Selection and Execution [0.9553307596675155]
We introduce the Hierarchical Reinforced Trader (HRT), a novel trading strategy employing a bi-level Hierarchical Reinforcement Learning framework. HRT integrates a Proximal Policy Optimization (PPO)-based High-Level Controller (HLC) for strategic stock selection with a Deep Deterministic Policy Gradient (DDPG)-based Low-Level Controller (LLC) tasked with optimizing trade executions to enhance portfolio value.
arXiv Detail & Related papers (2024-10-19T01:29:38Z)
Optimizing Portfolio with Two-Sided Transactions and Lending: A Reinforcement Learning Framework [0.0]
This study presents a Reinforcement Learning-based portfolio management model tailored for high-risk environments. We implement the model using the Soft Actor-Critic (SAC) agent with a Convolutional Neural Network with Multi-Head Attention. Tested over two 16-month periods of varying market volatility, the model significantly outperformed benchmarks.
arXiv Detail & Related papers (2024-08-09T23:36:58Z)
Developing A Multi-Agent and Self-Adaptive Framework with Deep Reinforcement Learning for Dynamic Portfolio Risk Management [1.2016264781280588]
A multi-agent reinforcement learning (RL) approach is proposed to balance the trade-off between the overall portfolio returns and their potential risks. The obtained empirical results clearly reveal the potential strengths of our proposed MASA framework.
arXiv Detail & Related papers (2024-02-01T11:31:26Z)
Monotonic Improvement Guarantees under Non-stationarity for Decentralized PPO [66.5384483339413]
We present a new monotonic improvement guarantee for optimizing decentralized policies in cooperative Multi-Agent Reinforcement Learning (MARL) We show that a trust region constraint can be effectively enforced in a principled way by bounding independent ratios based on the number of agents in training.
arXiv Detail & Related papers (2022-01-31T20:39:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.