Communication Load Balancing via Efficient Inverse Reinforcement
Learning
- URL: http://arxiv.org/abs/2303.16686v1
- Date: Wed, 22 Mar 2023 22:23:23 GMT
- Title: Communication Load Balancing via Efficient Inverse Reinforcement
Learning
- Authors: Abhisek Konar, Di Wu, Yi Tian Xu, Seowoo Jang, Steve Liu, Gregory
Dudek
- Abstract summary: In this work, we tackle the communication load balancing problem from an inverse reinforcement learning (IRL) approach.
We infer a reward function from a set of demonstrations, and then learn a reinforcement learning load balancing policy with the inferred reward function.
Compared to classical RL-based solution, the proposed solution can be more general and more suitable for real-world scenarios.
- Score: 13.052338083552863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Communication load balancing aims to balance the load between different
available resources, and thus improve the quality of service for network
systems. After formulating the load balancing (LB) as a Markov decision process
problem, reinforcement learning (RL) has recently proven effective in
addressing the LB problem. To leverage the benefits of classical RL for load
balancing, however, we need an explicit reward definition. Engineering this
reward function is challenging, because it involves the need for expert
knowledge and there lacks a general consensus on the form of an optimal reward
function. In this work, we tackle the communication load balancing problem from
an inverse reinforcement learning (IRL) approach. To the best of our knowledge,
this is the first time IRL has been successfully applied in the field of
communication load balancing. Specifically, first, we infer a reward function
from a set of demonstrations, and then learn a reinforcement learning load
balancing policy with the inferred reward function. Compared to classical
RL-based solution, the proposed solution can be more general and more suitable
for real-world scenarios. Experimental evaluations implemented on different
simulated traffic scenarios have shown our method to be effective and better
than other baselines by a considerable margin.
Related papers
- Interpretable Reinforcement Learning for Load Balancing using Kolmogorov-Arnold Networks [6.373998211961586]
Reinforcement learning (RL) has been increasingly applied to network control problems, such as load balancing.<n>Existing RL approaches often suffer from lack of interpretability and difficulty in extracting controller equations.<n>We propose the use of Kolmogorov-Arnold Networks (KAN) for interpretable RL in network control.
arXiv Detail & Related papers (2025-05-20T14:56:31Z) - Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training [71.16258800411696]
Reinforcement learning (RL) is a critical component of large language model (LLM) post-training.
Existing on-policy algorithms used for post-training are inherently incompatible with the use of experience replay buffers.
We propose efficiently obtaining this benefit of replay buffers via Trajectory Balance with Asynchrony (TBA)
arXiv Detail & Related papers (2025-03-24T17:51:39Z) - Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning [67.95280175998792]
A novel adversarial imitation learning (GAIL)-powered policy learning approach is proposed for optimizing beamforming, spectrum allocation, and remote user equipment (RUE) association ins.
We employ inverse RL (IRL) to automatically learn reward functions without manual tuning.
We show that the proposed MA-AL method outperforms traditional RL approaches, achieving a $14.6%$ improvement in convergence and reward value.
arXiv Detail & Related papers (2024-09-27T13:05:02Z) - Reinforcement Learning-Based Adaptive Load Balancing for Dynamic Cloud Environments [0.0]
We propose a novel adaptive load balancing framework using Reinforcement Learning (RL) to address these challenges.
Our framework is designed to dynamically reallocate tasks to minimize latency and ensure balanced resource usage across servers.
Experimental results show that the proposed RL-based load balancer outperforms traditional algorithms in terms of response time, resource utilization, and adaptability to changing workloads.
arXiv Detail & Related papers (2024-09-07T19:40:48Z) - Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation [37.36913210031282]
Preference-based reinforcement learning (PbRL) has shown impressive capabilities in training agents without reward engineering.
We propose SEER, an efficient PbRL method that integrates label smoothing and policy regularization techniques.
arXiv Detail & Related papers (2024-05-29T01:49:20Z) - Dense Reward for Free in Reinforcement Learning from Human Feedback [64.92448888346125]
We leverage the fact that the reward model contains more information than just its scalar output.
We use these attention weights to redistribute the reward along the whole completion.
Empirically, we show that it stabilises training, accelerates the rate of learning, and, in practical cases, may lead to better local optima.
arXiv Detail & Related papers (2024-02-01T17:10:35Z) - REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Toward Computationally Efficient Inverse Reinforcement Learning via
Reward Shaping [42.09724642733125]
This work motivates the use of potential-based reward shaping to reduce the computational burden of each RL sub-problem.
This work serves as a proof-of-concept and we hope will inspire future developments towards computationally efficient IRL.
arXiv Detail & Related papers (2023-12-15T17:50:18Z) - Language Reward Modulation for Pretraining Reinforcement Learning [61.76572261146311]
We propose leveraging the capabilities of LRFs as a pretraining signal for reinforcement learning.
Our VLM pretraining approach, which is a departure from previous attempts to use LRFs, can warmstart sample-efficient learning on robot manipulation tasks.
arXiv Detail & Related papers (2023-08-23T17:37:51Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Multi-Agent Reinforcement Learning for Network Load Balancing in Data
Center [4.141301293112916]
This paper presents the network load balancing problem, a challenging real-world task for reinforcement learning methods.
The cooperative network load balancing task is formulated as a Dec-POMDP problem, which naturally induces the MARL methods.
To bridge the reality gap for applying learning-based methods, all methods are directly trained and evaluated on an emulation system.
arXiv Detail & Related papers (2022-01-27T18:47:59Z) - Value Penalized Q-Learning for Recommender Systems [30.704083806571074]
Scaling reinforcement learning to recommender systems (RS) is promising since maximizing the expected cumulative rewards for RL agents meets the objective of RS.
A key approach to this goal is offline RL, which aims to learn policies from logged data.
We propose Value Penalized Q-learning (VPQ), an uncertainty-based offline RL algorithm.
arXiv Detail & Related papers (2021-10-15T08:08:28Z) - Model-Augmented Q-learning [112.86795579978802]
We propose a MFRL framework that is augmented with the components of model-based RL.
Specifically, we propose to estimate not only the $Q$-values but also both the transition and the reward with a shared network.
We show that the proposed scheme, called Model-augmented $Q$-learning (MQL), obtains a policy-invariant solution which is identical to the solution obtained by learning with true reward.
arXiv Detail & Related papers (2021-02-07T17:56:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.