Related papers: Deep Index Policy for Multi-Resource Restless Matching Bandit and Its Application in Multi-Channel Scheduling

Deep Index Policy for Multi-Resource Restless Matching Bandit and Its Application in Multi-Channel Scheduling

URL: http://arxiv.org/abs/2408.07205v2
Date: Tue, 20 Aug 2024 07:20:10 GMT
Title: Deep Index Policy for Multi-Resource Restless Matching Bandit and Its Application in Multi-Channel Scheduling
Authors: Nida Zamir, I-Hong Hou,
Abstract summary: We discuss a multi-resource restless matching bandit (MR-RMB) model for heterogeneous resource systems. We introduce a new Deep Index Policy (DIP), an online learning algorithm tailored for MR-RMB. Our simulation results show that DIP indeed learns the partial indexes efficiently.
Score: 6.648181286553698
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scheduling in multi-channel wireless communication system presents formidable challenges in effectively allocating resources. To address these challenges, we investigate a multi-resource restless matching bandit (MR-RMB) model for heterogeneous resource systems with an objective of maximizing long-term discounted total rewards while respecting resource constraints. We have also generalized to applications beyond multi-channel wireless. We discuss the Max-Weight Index Matching algorithm, which optimizes resource allocation based on learned partial indexes. We have derived the policy gradient theorem for index learning. Our main contribution is the introduction of a new Deep Index Policy (DIP), an online learning algorithm tailored for MR-RMB. DIP learns the partial index by leveraging the policy gradient theorem for restless arms with convoluted and unknown transition kernels of heterogeneous resources. We demonstrate the utility of DIP by evaluating its performance for three different MR-RMB problems. Our simulation results show that DIP indeed learns the partial indexes efficiently.

Related papers

Generative Diffusion Models for Resource Allocation in Wireless Networks [77.36145730415045]
We train a policy to imitate an expert and generate new samples from the optimal distribution. We achieve near-optimal performance through sequential execution of the generated samples. We present numerical results in a case study of power control in multi-user interference networks.
arXiv Detail & Related papers (2025-04-28T21:44:31Z)
Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes [7.028778922533688]
Average-reward Markov decision processes (MDPs) provide a foundational framework for sequential decision-making under uncertainty. We study a unique structural property of average-reward MDPs and utilize it to introduce Reward-Extended Differential (or RED) reinforcement learning.
arXiv Detail & Related papers (2024-10-14T14:52:23Z)
Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning [69.00997996453842]
We propose a deep Reinforcement Learning approach to learn a joint Admission Control and Resource Allocation policy for virtual network embedding. We show that HRL-ACRA outperforms state-of-the-art baselines in terms of both the acceptance ratio and long-term average revenue.
arXiv Detail & Related papers (2024-06-25T07:42:30Z)
Intelligent Hybrid Resource Allocation in MEC-assisted RAN Slicing Network [72.2456220035229]
We aim to maximize the SSR for heterogeneous service demands in the cooperative MEC-assisted RAN slicing system. We propose a recurrent graph reinforcement learning (RGRL) algorithm to intelligently learn the optimal hybrid RA policy.
arXiv Detail & Related papers (2024-05-02T01:36:13Z)
Towards Realistic Low-resource Relation Extraction: A Benchmark with Empirical Baseline Study [51.33182775762785]
This paper presents an empirical study to build relation extraction systems in low-resource settings. We investigate three schemes to evaluate the performance in low-resource settings: (i) different types of prompt-based methods with few-shot labeled data; (ii) diverse balancing methods to address the long-tailed distribution issue; and (iii) data augmentation technologies and self-training to generate more labeled in-domain data.
arXiv Detail & Related papers (2022-10-19T15:46:37Z)
MIX-MAB: Reinforcement Learning-based Resource Allocation Algorithm for LoRaWAN [6.22984202194369]
This paper focuses on improving the resource allocation algorithm in terms of packet delivery ratio (PDR) We propose a resource allocation algorithm that enables the EDs to configure their transmission parameters in a distributed manner. Numerical results show that the proposed solution performs better than the existing schemes in terms of convergence time and PDR.
arXiv Detail & Related papers (2022-06-07T15:50:05Z)
Distributional Reinforcement Learning for Multi-Dimensional Reward Functions [91.88969237680669]
We introduce Multi-Dimensional Distributional DQN (MD3QN) to model the joint return distribution from multiple reward sources. As a by-product of joint distribution modeling, MD3QN can capture the randomness in returns for each source of reward. In experiments, our method accurately models the joint return distribution in environments with richly correlated reward functions.
arXiv Detail & Related papers (2021-10-26T11:24:23Z)
Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC. We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z)
Deep Reinforcement Learning for Resource Constrained Multiclass Scheduling in Wireless Networks [0.0]
In our setup, the available limited bandwidth resources are allocated in order to serve randomly arriving service demands. We propose a distributional Deep Deterministic Policy Gradient (DDPG) algorithm combined with Deep Sets to tackle the problem. Our proposed algorithm is tested on both synthetic and real data, showing consistent gains against state-of-the-art conventional methods.
arXiv Detail & Related papers (2020-11-27T09:49:38Z)
Resource Allocation via Model-Free Deep Learning in Free Space Optical Communications [119.81868223344173]
The paper investigates the general problem of resource allocation for mitigating channel fading effects in Free Space Optical (FSO) communications. Under this framework, we propose two algorithms that solve FSO resource allocation problems.
arXiv Detail & Related papers (2020-07-27T17:38:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.