Related papers: Dynamic Preference Multi-Objective Reinforcement Learning for Internet Network Management

Dynamic Preference Multi-Objective Reinforcement Learning for Internet Network Management

URL: http://arxiv.org/abs/2506.13153v1
Date: Mon, 16 Jun 2025 07:03:58 GMT
Title: Dynamic Preference Multi-Objective Reinforcement Learning for Internet Network Management
Authors: DongNyeong Heo, Daniela Noemi Rim, Heeyoul Choi,
Abstract summary: We propose new RL-based network management agents that can select actions based on both states and preferences.<n>We propose a numerical method that can estimate the distribution of preference that is advantageous for unbiased training.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: An internet network service provider manages its network with multiple objectives, such as high quality of service (QoS) and minimum computing resource usage. To achieve these objectives, a reinforcement learning-based (RL) algorithm has been proposed to train its network management agent. Usually, their algorithms optimize their agents with respect to a single static reward formulation consisting of multiple objectives with fixed importance factors, which we call preferences. However, in practice, the preference could vary according to network status, external concerns and so on. For example, when a server shuts down and it can cause other servers' traffic overloads leading to additional shutdowns, it is plausible to reduce the preference of QoS while increasing the preference of minimum computing resource usages. In this paper, we propose new RL-based network management agents that can select actions based on both states and preferences. With our proposed approach, we expect a single agent to generalize on various states and preferences. Furthermore, we propose a numerical method that can estimate the distribution of preference that is advantageous for unbiased training. Our experiment results show that the RL agents trained based on our proposed approach significantly generalize better with various preferences than the previous RL approaches, which assume static preference during training. Moreover, we demonstrate several analyses that show the advantages of our numerical estimation method.

Related papers

Percentile-Based Deep Reinforcement Learning and Reward Based Personalization For Delay Aware RAN Slicing in O-RAN [0.0]
We tackle the challenge of radio access network slicing within an open RAN (O-RAN) architecture.<n>Our focus centers on a network that includes multiple mobile virtual network operators (MVNOs) competing for physical resource blocks.<n>We introduce a reward-based personalization method where each agent prioritizes other agents' model weights based on their performance.
arXiv Detail & Related papers (2025-07-24T05:45:41Z)
Active Learning for Direct Preference Optimization [59.84525302418018]
Direct preference optimization (DPO) is a form of reinforcement learning from human feedback.<n>We propose an active learning framework for DPO, which can be applied to collect human feedback online or to choose the most informative subset of already collected feedback offline.
arXiv Detail & Related papers (2025-03-03T00:36:31Z)
From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
Multi-Objective Deep Reinforcement Learning for Optimisation in Autonomous Systems [3.2826250607043796]
Multi-Objective Reinforcement Learning (MORL) techniques exist but they have mostly been applied in RL benchmarks rather than real-world AS systems. In this work, we use a MORL technique called Deep W-Learning (DWN) to find the optimal configuration for runtime performance optimization. We compare DWN to two single-objective optimization implementations: epsilon-greedy algorithm and Deep Q-Networks.
arXiv Detail & Related papers (2024-08-02T11:16:09Z)
Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [94.2860766709971]
We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a wireless network with statistically-identical agents.<n>Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies.
arXiv Detail & Related papers (2024-04-04T06:24:11Z)
MaxMin-RLHF: Alignment with Diverse Human Preferences [101.57443597426374]
Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data.<n>We learn a mixture of preference distributions via an expectation-maximization algorithm to better represent diverse human preferences.<n>Our algorithm achieves an average improvement of more than 16% in win-rates over conventional RLHF algorithms.
arXiv Detail & Related papers (2024-02-14T03:56:27Z)
Asynchronous Message-Passing and Zeroth-Order Optimization Based Distributed Learning with a Use-Case in Resource Allocation in Communication Networks [11.182443036683225]
Distributed learning and adaptation have received significant interest and found wide-ranging applications in machine learning signal processing.<n>This paper specifically focuses on a scenario where agents collaborate towards a common task.<n>Agents, acting as transmitters, collaboratively train their individual policies to maximize a global reward.
arXiv Detail & Related papers (2023-11-08T11:12:27Z)
Rethinking Value Function Learning for Generalization in Reinforcement Learning [11.516147824168732]
We focus on the problem of training RL agents on multiple training environments to improve observational generalization performance. We identify that the value network in the multiple-environment setting is more challenging to optimize and prone to overfitting training data than in the conventional single-environment setting. We propose Delayed-Critic Policy Gradient (DCPG), which implicitly penalizes the value estimates by optimizing the value network less frequently with more training data than the policy network.
arXiv Detail & Related papers (2022-10-18T16:17:47Z)
An Expectation-Maximization Perspective on Federated Learning [75.67515842938299]
Federated learning describes the distributed training of models across multiple clients while keeping the data private on-device. In this work, we view the server-orchestrated federated learning process as a hierarchical latent variable model where the server provides the parameters of a prior distribution over the client-specific model parameters. We show that with simple Gaussian priors and a hard version of the well known Expectation-Maximization (EM) algorithm, learning in such a model corresponds to FedAvg, the most popular algorithm for the federated learning setting.
arXiv Detail & Related papers (2021-11-19T12:58:59Z)
Information Directed Reward Learning for Reinforcement Learning [64.33774245655401]
We learn a model of the reward function that allows standard RL algorithms to achieve high expected return with as few expert queries as possible. In contrast to prior active reward learning methods designed for specific types of queries, IDRL naturally accommodates different query types. We support our findings with extensive evaluations in multiple environments and with different types of queries.
arXiv Detail & Related papers (2021-02-24T18:46:42Z)
Deep Reinforcement Learning for QoS-Constrained Resource Allocation in Multiservice Networks [0.3324986723090368]
This article focuses on a non- optimization problem whose main aim is to maximize the spectral efficiency to satisfaction guarantees in multiservice wireless systems. We propose a solution based on a Reinforcement Learning (RL) framework, where each agent makes its decisions to find a policy by interacting with the local environment. We show a near optimal performance of the latter in terms of throughput and outage rate.
arXiv Detail & Related papers (2020-03-03T19:32:15Z)
Resource Management in Wireless Networks via Multi-Agent Deep Reinforcement Learning [15.091308167639815]
We propose a mechanism for distributed resource management and interference mitigation in wireless networks using multi-agent deep reinforcement learning (RL) We equip each transmitter in the network with a deep RL agent that receives delayed observations from its associated users, while also exchanging observations with its neighboring agents. Our proposed framework enables agents to make decisions simultaneously and in a distributed manner, unaware of the concurrent decisions of other agents.
arXiv Detail & Related papers (2020-02-14T19:01:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.