Related papers: Task Specific Sharpness Aware O-RAN Resource Management using Multi Agent Reinforcement Learning

Task Specific Sharpness Aware O-RAN Resource Management using Multi Agent Reinforcement Learning

URL: http://arxiv.org/abs/2511.15002v1
Date: Wed, 19 Nov 2025 00:55:24 GMT
Title: Task Specific Sharpness Aware O-RAN Resource Management using Multi Agent Reinforcement Learning
Authors: Fatemeh Lotfi, Hossein Rajoli, Fatemeh Afghah,
Abstract summary: Next-generation networks utilize the Open Radio Access Network (O-RAN) architecture to enable dynamic resource management.<n>Deep reinforcement learning models often struggle with robustness and generalizability in dynamic environments.<n>This paper introduces a novel resource management approach that enhances the Soft Actor Critic (SAC) algorithm with Sharpness-Aware Minimization (SAM) in a distributed Multi-Agent RL (MARL) framework.
Score: 8.26664397566735
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Next-generation networks utilize the Open Radio Access Network (O-RAN) architecture to enable dynamic resource management, facilitated by the RAN Intelligent Controller (RIC). While deep reinforcement learning (DRL) models show promise in optimizing network resources, they often struggle with robustness and generalizability in dynamic environments. This paper introduces a novel resource management approach that enhances the Soft Actor Critic (SAC) algorithm with Sharpness-Aware Minimization (SAM) in a distributed Multi-Agent RL (MARL) framework. Our method introduces an adaptive and selective SAM mechanism, where regularization is explicitly driven by temporal-difference (TD)-error variance, ensuring that only agents facing high environmental complexity are regularized. This targeted strategy reduces unnecessary overhead, improves training stability, and enhances generalization without sacrificing learning efficiency. We further incorporate a dynamic $ρ$ scheduling scheme to refine the exploration-exploitation trade-off across agents. Experimental results show our method significantly outperforms conventional DRL approaches, yielding up to a $22\%$ improvement in resource allocation efficiency and ensuring superior QoS satisfaction across diverse O-RAN slices.

Related papers

Meta Hierarchical Reinforcement Learning for Scalable Resource Management in O-RAN [9.290879387995401]
This paper proposes an adaptive Meta Hierarchical Reinforcement Learning framework, inspired by Model Agnostic Meta Learning (MAML)<n>The framework integrates hierarchical control with meta learning to enable both global and local adaptation.<n>It achieves up to 40% faster adaptation and consistent fairness, latency, and throughput performance as network scale increases.
arXiv Detail & Related papers (2025-12-08T08:16:27Z)
Your Reward Function for RL is Your Best PRM for Search: Unifying RL and Search-Based TTS [62.22644307952087]
We introduce AIRL-S, the first natural unification of RL-based and search-based TTS.<n>We leverage adversarial inverse reinforcement learning (AIRL) combined with group relative policy optimization (GRPO) to learn a dense, dynamic PRM directly from correct reasoning traces.<n>Our results show that our unified approach improves performance by 9 % on average over the base model, matching GPT-4o.
arXiv Detail & Related papers (2025-08-19T23:41:15Z)
Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z)
Prompt-Tuned LLM-Augmented DRL for Dynamic O-RAN Network Slicing [5.62872273155603]
Large Language Models (LLMs) structure unorganized network feedback into meaningful latent representations.<n>In O-RAN slicing, concepts like SNR, power levels and throughput are semantically related.<n>We introduce a contextualization-based adaptation method that integrates learnable prompts into an LLM-augmented DRL framework.
arXiv Detail & Related papers (2025-05-31T14:12:56Z)
Network Resource Optimization for ML-Based UAV Condition Monitoring with Vibration Analysis [54.550658461477106]
Condition Monitoring (CM) uses Machine Learning (ML) models to identify abnormal and adverse conditions.<n>This work explores the optimization of network resources for ML-based UAV CM frameworks.<n>By leveraging dimensionality reduction techniques, there is a 99.9% reduction in network resource consumption.
arXiv Detail & Related papers (2025-02-21T14:36:12Z)
Cluster-Based Multi-Agent Task Scheduling for Space-Air-Ground Integrated Networks [60.085771314013044]
Low-altitude economy holds significant potential for development in areas such as communication and sensing.<n>We propose a Clustering-based Multi-agent Deep Deterministic Policy Gradient (CMADDPG) algorithm to address the multi-UAV cooperative task scheduling challenges in SAGIN.
arXiv Detail & Related papers (2024-12-14T06:17:33Z)
From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
Meta Reinforcement Learning Approach for Adaptive Resource Optimization in O-RAN [6.326120268549892]
Open Radio Access Network (O-RAN) addresses the variable demands of modern networks with unprecedented efficiency and adaptability. This paper proposes a novel Meta Deep Reinforcement Learning (Meta-DRL) strategy, inspired by Model-Agnostic Meta-Learning (MAML) to advance resource block and downlink power allocation in O-RAN.
arXiv Detail & Related papers (2024-09-30T23:04:30Z)
Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning [67.95280175998792]
A novel adversarial imitation learning (GAIL)-powered policy learning approach is proposed for optimizing beamforming, spectrum allocation, and remote user equipment (RUE) association ins. We employ inverse RL (IRL) to automatically learn reward functions without manual tuning. We show that the proposed MA-AL method outperforms traditional RL approaches, achieving a $14.6%$ improvement in convergence and reward value.
arXiv Detail & Related papers (2024-09-27T13:05:02Z)
Generative AI for O-RAN Slicing: A Semi-Supervised Approach with VAE and Contrastive Learning [5.1435595246496595]
This paper introduces a novel generative AI (GAI)-driven, unified semi-supervised learning architecture for optimizing resource allocation and network slicing in O-RAN.<n>Termed Generative Semi-Supervised VAE-Contrastive Learning, our approach maximizes the weighted user equipment (UE) throughput and allocates physical resource blocks (PRBs) to enhance the quality of service for eMBB and URLLC services.
arXiv Detail & Related papers (2024-01-16T22:23:27Z)
Attention-based Open RAN Slice Management using Deep Reinforcement Learning [6.177038245239758]
This paper introduces an innovative attention-based deep RL (ADRL) technique that leverages the O-RAN disaggregated modules and distributed agent cooperation. Simulation results demonstrate significant improvements in network performance compared to other DRL baseline methods.
arXiv Detail & Related papers (2023-06-15T20:37:19Z)
Evolutionary Deep Reinforcement Learning for Dynamic Slice Management in O-RAN [11.464582983164991]
New open radio access network (O-RAN) with distinguishing features such as flexible design, disaggregated virtual and programmable components, and intelligent closed-loop control was developed. O-RAN slicing is being investigated as a critical strategy for ensuring network quality of service (QoS) in the face of changing circumstances. This paper introduces a novel framework able to manage the network slices through provisioned resources intelligently.
arXiv Detail & Related papers (2022-08-30T17:00:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.