A Deep Learning Based Resource Allocator for Communication Systems with
Dynamic User Utility Demands
- URL: http://arxiv.org/abs/2311.04600v1
- Date: Wed, 8 Nov 2023 11:02:51 GMT
- Title: A Deep Learning Based Resource Allocator for Communication Systems with
Dynamic User Utility Demands
- Authors: Pourya Behmandpoor, Panagiotis Patrinos, Marc Moonen
- Abstract summary: A DL based resource allocator (ALCOR) is introduced, which allows users to freely adjust their utility demands.
ALCOR employs deep neural networks (DNNs), as the policy, in an iterative optimization algorithm.
The policy performs unconstrained RA (URA) -- RA without taking into account user utility demands -- among active users to maximize the sum utility (SU) at each time instant.
- Score: 12.216015676346032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning (DL) based resource allocation (RA) has recently gained a lot
of attention due to its performance efficiency. However, most of the related
studies assume an ideal case where the number of users and their utility
demands, e.g., data rate constraints, are fixed and the designed DL based RA
scheme exploits a policy trained only for these fixed parameters. A
computationally complex policy retraining is required whenever these parameters
change. Therefore, in this paper, a DL based resource allocator (ALCOR) is
introduced, which allows users to freely adjust their utility demands based on,
e.g., their application layer. ALCOR employs deep neural networks (DNNs), as
the policy, in an iterative optimization algorithm. The optimization algorithm
aims to optimize the on-off status of users in a time-sharing problem to
satisfy their utility demands in expectation. The policy performs unconstrained
RA (URA) -- RA without taking into account user utility demands -- among active
users to maximize the sum utility (SU) at each time instant. Based on the
chosen URA scheme, ALCOR can perform RA in a model-based or model-free manner
and in a centralized or distributed scenario. Derived convergence analyses
provide guarantees for the convergence of ALCOR, and numerical experiments
corroborate its effectiveness.
Related papers
- Design Optimization of NOMA Aided Multi-STAR-RIS for Indoor Environments: A Convex Approximation Imitated Reinforcement Learning Approach [51.63921041249406]
Sixth-generation (6G) networks leverage simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) to overcome the limitations of traditional RISs.
deploying STAR-RISs indoors presents challenges in interference mitigation, power consumption, and real-time configuration.
A novel network architecture utilizing multiple access points (APs) and STAR-RISs is proposed for indoor communication.
arXiv Detail & Related papers (2024-06-19T07:17:04Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.
To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.
Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - LoRA-SP: Streamlined Partial Parameter Adaptation for Resource-Efficient Fine-Tuning of Large Language Models [7.926974917872204]
LoRA-SP is a novel approach utilizing randomized half-selective parameter freezing.
LoRA-SP significantly reduces computational and memory requirements without compromising model performance.
arXiv Detail & Related papers (2024-02-28T06:50:10Z) - PILLOW: Enhancing Efficient Instruction Fine-tuning via Prompt Matching [21.835846173630717]
Low-Rank Adaptation (LoRA) has become a promising alternative to instruction fine-tuning.
PILLOW aims to improve LoRA's performance by a discrimination-based LLM ability.
PILLOW exhibits commensurate performance on various evaluation metrics compared with typical instruction fine-tuning methods.
arXiv Detail & Related papers (2023-12-09T17:38:39Z) - Joint User Association, Interference Cancellation and Power Control for
Multi-IRS Assisted UAV Communications [80.35959154762381]
Intelligent reflecting surface (IRS)-assisted unmanned aerial vehicle (UAV) communications are expected to alleviate the load of ground base stations in a cost-effective way.
Existing studies mainly focus on the deployment and resource allocation of a single IRS instead of multiple IRSs.
We propose a new optimization algorithm for joint IRS-user association, trajectory optimization of UAVs, successive interference cancellation (SIC) decoding order scheduling and power allocation.
arXiv Detail & Related papers (2023-12-08T01:57:10Z) - Multi-Objective Coordination Graphs for the Expected Scalarised Returns
with Generative Flow Models [2.7648976108201815]
Key to solving real-world problems is to exploit sparse dependency structures between agents.
In wind farm control a trade-off exists between maximising power and minimising stress on the systems components.
We model such sparse dependencies between agents as a multi-objective coordination graph (MO-CoG)
arXiv Detail & Related papers (2022-07-01T12:10:15Z) - Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm [16.115903198836694]
Learning optimal behavior from existing data is one of the most important problems in Reinforcement Learning (RL)
This is known as "off-policy control" in RL where an agent's objective is to compute an optimal policy based on the data obtained from the given policy (known as the behavior policy)
This work proposes an off-policy natural actor-critic algorithm that utilizes state-action distribution correction for handling the off-policy behavior and the natural policy gradient for sample efficiency.
arXiv Detail & Related papers (2021-10-19T14:36:45Z) - Model-Free Learning of Optimal Deterministic Resource Allocations in
Wireless Systems via Action-Space Exploration [4.721069729610892]
We propose a technically grounded and scalable deterministic-dual gradient policy method for efficiently learning optimal parameterized resource allocation policies.
Our method not only efficiently exploits gradient availability of popular universal representations such as deep networks, but is also truly model-free, as it relies on consistent zeroth-order gradient approximations of associated random network services constructed via low-dimensional perturbations in action space.
arXiv Detail & Related papers (2021-08-23T18:26:16Z) - Resource Allocation via Model-Free Deep Learning in Free Space Optical
Communications [119.81868223344173]
The paper investigates the general problem of resource allocation for mitigating channel fading effects in Free Space Optical (FSO) communications.
Under this framework, we propose two algorithms that solve FSO resource allocation problems.
arXiv Detail & Related papers (2020-07-27T17:38:51Z) - Optimization-driven Deep Reinforcement Learning for Robust Beamforming
in IRS-assisted Wireless Communications [54.610318402371185]
Intelligent reflecting surface (IRS) is a promising technology to assist downlink information transmissions from a multi-antenna access point (AP) to a receiver.
We minimize the AP's transmit power by a joint optimization of the AP's active beamforming and the IRS's passive beamforming.
We propose a deep reinforcement learning (DRL) approach that can adapt the beamforming strategies from past experiences.
arXiv Detail & Related papers (2020-05-25T01:42:55Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.