Related papers: Handoff Design in User-Centric Cell-Free Massive MIMO Networks Using DRL

Handoff Design in User-Centric Cell-Free Massive MIMO Networks Using DRL

URL: http://arxiv.org/abs/2507.20966v2
Date: Sat, 02 Aug 2025 03:14:14 GMT
Title: Handoff Design in User-Centric Cell-Free Massive MIMO Networks Using DRL
Authors: Hussein A. Ammar, Raviraj Adve, Shahram Shahbazpanahi, Gary Boudreau, Israfil Bahceci,
Abstract summary: This paper presents a deep reinforcement learning-based solution to predict and manage connections for mobile users.<n>Our solution employs the Soft Actor-Critic algorithm, with continuous action space representation, to train a deep neural network to serve as the HO policy.<n>We present a novel proposition for a reward function that integrates a HO penalty in order to balance the attainable rate and the associated overhead related to HOs.
Score: 26.772811966031746
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the user-centric cell-free massive MIMO (UC-mMIMO) network scheme, user mobility necessitates updating the set of serving access points to maintain the user-centric clustering. Such updates are typically performed through handoff (HO) operations; however, frequent HOs lead to overheads associated with the allocation and release of resources. This paper presents a deep reinforcement learning (DRL)-based solution to predict and manage these connections for mobile users. Our solution employs the Soft Actor-Critic algorithm, with continuous action space representation, to train a deep neural network to serve as the HO policy. We present a novel proposition for a reward function that integrates a HO penalty in order to balance the attainable rate and the associated overhead related to HOs. We develop two variants of our system; the first one uses mobility direction-assisted (DA) observations that are based on the user movement pattern, while the second one uses history-assisted (HA) observations that are based on the history of the large-scale fading (LSF). Simulation results show that our DRL-based continuous action space approach is more scalable than discrete space counterpart, and that our derived HO policy automatically learns to gather HOs in specific time slots to minimize the overhead of initiating HOs. Our solution can also operate in real time with a response time less than 0.4 ms.

Related papers

RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility [9.200793414310182]
We introduce RHYTHM (Reasoning with Hierarchical Temporal Tokenization for Human Mobility), a unified framework for predicting human mobility.<n>We use large language models (LLMs) as general-purpose predictors and reasoners.<n> RHYTHM achieves a 2.4% in overall accuracy, a 5.0% increase on weekends, and a 24.6% reduction in training time.
arXiv Detail & Related papers (2025-09-27T04:55:56Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers. Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy. We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z)
REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world.<n>Recent methods aim to mitigate misalignment by learning reward functions from human preferences.<n>We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z)
Mobility-Aware Joint User Scheduling and Resource Allocation for Low Latency Federated Learning [14.343345846105255]
We propose a practical model for user mobility in Federated learning systems. We develop a user scheduling and resource allocation method to minimize the training delay with constrained communication resources. Specifically, we first formulate an optimization problem with user mobility that jointly considers user selection, BS assignment to users, and bandwidth allocation.
arXiv Detail & Related papers (2023-07-18T13:48:05Z)
Sparsity-Aware Intelligent Massive Random Access Control in Open RAN: A Reinforcement Learning Based Approach [61.74489383629319]
Massive random access of devices in the emerging Open Radio Access Network (O-RAN) brings great challenge to the access control and management. reinforcement-learning (RL)-assisted scheme of closed-loop access control is proposed to preserve sparsity of access requests. Deep-RL-assisted SAUD is proposed to resolve highly complex environments with continuous and high-dimensional state and action spaces.
arXiv Detail & Related papers (2023-03-05T12:25:49Z)
Decentralized Federated Reinforcement Learning for User-Centric Dynamic TFDD Control [37.54493447920386]
We propose a learning-based dynamic time-frequency division duplexing (D-TFDD) scheme to meet asymmetric and heterogeneous traffic demands. We formulate the problem as a decentralized partially observable Markov decision process (Dec-POMDP) In order to jointly optimize the global resources in a decentralized manner, we propose a federated reinforcement learning (RL) algorithm named Wolpertinger deep deterministic policy gradient (FWDDPG) algorithm.
arXiv Detail & Related papers (2022-11-04T07:39:21Z)
Temporal Memory Relation Network for Workflow Recognition from Surgical Video [53.20825496640025]
We propose a novel end-to-end temporal memory relation network (TMNet) for relating long-range and multi-scale temporal patterns. We have extensively validated our approach on two benchmark surgical video datasets.
arXiv Detail & Related papers (2021-03-30T13:20:26Z)
Smart Scheduling based on Deep Reinforcement Learning for Cellular Networks [18.04856086228028]
We propose a smart scheduling scheme based on deep reinforcement learning (DRL) We provide implementation-friend designs, i.e., a scalable neural network design for the agent and a virtual environment training framework. We show that the DRL-based smart scheduling outperforms the conventional scheduling method and can be adopted in practical systems.
arXiv Detail & Related papers (2021-03-22T02:09:16Z)
Reinforcement Learning for Datacenter Congestion Control [50.225885814524304]
Successful congestion control algorithms can dramatically improve latency and overall network throughput. Until today, no such learning-based algorithms have shown practical potential in this domain. We devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks. We show that this scheme outperforms alternative popular RL approaches, and generalizes to scenarios that were not seen during training.
arXiv Detail & Related papers (2021-02-18T13:49:28Z)
Deep-Reinforcement-Learning-Based Scheduling with Contiguous Resource Allocation for Next-Generation Cellular Systems [4.227387975627387]
We propose a novel scheduling algorithm with contiguous frequency-domain resource allocation (FDRA) based on deep reinforcement learning (DRL) The proposed DRL-based scheduling algorithm outperforms other representative baseline schemes while having lower online computational complexity.
arXiv Detail & Related papers (2020-10-11T05:41:40Z)
MDLdroid: a ChainSGD-reduce Approach to Mobile Deep Learning for Personal Mobile Sensing [14.574274428615666]
Running deep learning on devices offers several advantages including data privacy preservation and low-latency response for both model robustness and update. Personal mobile sensing applications are mostly user-specific and highly affected by environment. We present MDLdroid, a novel decentralized mobile deep learning framework to enable resource-aware on-device collaborative learning.
arXiv Detail & Related papers (2020-02-07T16:55:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.