Related papers: Hierarchical Reinforcement Learning for the Dynamic VNE with Alternatives Problem

Hierarchical Reinforcement Learning for the Dynamic VNE with Alternatives Problem

URL: http://arxiv.org/abs/2512.05207v1
Date: Thu, 04 Dec 2025 19:22:40 GMT
Title: Hierarchical Reinforcement Learning for the Dynamic VNE with Alternatives Problem
Authors: Ali Al Housseini, Cristina Rottondi, Omran Ayoub,
Abstract summary: VNE with Alternative topologies (VNEAP) was introduced to capture malleable VNRs.<n>This paper proposes HRL-VNEAP, a hierarchical reinforcement learning approach for VNEAP under dynamic arrivals.
Score: 0.818198392834469
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Virtual Network Embedding (VNE) is a key enabler of network slicing, yet most formulations assume that each Virtual Network Request (VNR) has a fixed topology. Recently, VNE with Alternative topologies (VNEAP) was introduced to capture malleable VNRs, where each request can be instantiated using one of several functionally equivalent topologies that trade resources differently. While this flexibility enlarges the feasible space, it also introduces an additional decision layer, making dynamic embedding more challenging. This paper proposes HRL-VNEAP, a hierarchical reinforcement learning approach for VNEAP under dynamic arrivals. A high-level policy selects the most suitable alternative topology (or rejects the request), and a low-level policy embeds the chosen topology onto the substrate network. Experiments on realistic substrate topologies under multiple traffic loads show that naive exploitation strategies provide only modest gains, whereas HRL-VNEAP consistently achieves the best performance across all metrics. Compared to the strongest tested baselines, HRL-VNEAP improves acceptance ratio by up to \textbf{20.7\%}, total revenue by up to \textbf{36.2\%}, and revenue-over-cost by up to \textbf{22.1\%}. Finally, we benchmark against an MILP formulation on tractable instances to quantify the remaining gap to optimality and motivate future work on learning- and optimization-based VNEAP solutions.

Related papers

MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization [56.074760766965085]
Group-Relative Policy Optimization has emerged as an efficient paradigm for aligning Large Language Models (LLMs)<n>We propose MAESTRO, which treats reward scalarization as a dynamic latent policy, leveraging the model's terminal hidden states as a semantic bottleneck.<n>We formulate this as a contextual bandit problem within a bi-level optimization framework, where a lightweight Conductor network co-evolves with the policy by utilizing group-relative advantages as a meta-reward signal.
arXiv Detail & Related papers (2026-01-12T05:02:48Z)
Multi-Task Vehicle Routing Solver via Mixture of Specialized Experts under State-Decomposable MDP [57.28979643999352]
We propose a framework that enables unified solvers to perceive the shared-component nature across VRP variants.<n>We introduce a State-Decomposable MDP (SDMDP) that reformulates VRPs by expressing the state space as the Cartesian product of basis state spaces.<n>A Latent Space-based SDMDP extension is developed by incorporating both the optimal basis policies and a learnable mixture function.
arXiv Detail & Related papers (2025-10-24T13:31:31Z)
Towards Scalable O-RAN Resource Management: Graph-Augmented Proximal Policy Optimization [0.9143713488498514]
Open Radio Access Network (O-RAN) architectures enable flexible, scalable and cost-efficient mobile networks by disaggregating and virtualizing baseband functions.<n>This flexibility introduces significant challenges for resource management, requiring joint functional split selection and unit placement optimization under dynamic demands and complex topologies.<n>We propose a novel Graph-mented Proximal Policy Optimization framework that leverages Graph Graph Networks (GNNs) for topology-aware feature extraction and masking action.
arXiv Detail & Related papers (2025-09-01T08:53:04Z)
Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning [69.00997996453842]
We propose a deep Reinforcement Learning approach to learn a joint Admission Control and Resource Allocation policy for virtual network embedding. We show that HRL-ACRA outperforms state-of-the-art baselines in terms of both the acceptance ratio and long-term average revenue.
arXiv Detail & Related papers (2024-06-25T07:42:30Z)
FlagVNE: A Flexible and Generalizable Reinforcement Learning Framework for Network Resource Allocation [31.74480390058691]
We propose a FLexible And Generalizable RL framework for VNE, named FlagVNE. To tackle the expansive and dynamic action space, we design a hierarchical decoder to generate adaptive action probability distributions. To overcome the generalization issue for varying VNR sizes, we propose a meta-RL-based training method.
arXiv Detail & Related papers (2024-04-19T05:24:24Z)
Adaptive Resource Allocation for Virtualized Base Stations in O-RAN with Online Learning [55.08287089554127]
Open Radio Access Network systems, with their base stations (vBSs), offer operators the benefits of increased flexibility, reduced costs, vendor diversity, and interoperability.<n>We propose an online learning algorithm that balances the effective throughput and vBS energy consumption, even under unforeseeable and "challenging'' environments.<n>We prove the proposed solutions achieve sub-linear regret, providing zero average optimality gap even in challenging environments.
arXiv Detail & Related papers (2023-09-04T17:30:21Z)
Deep Reinforcement Learning for Inventory Networks: Toward Reliable Policy Optimization [2.9016349714298157]
We argue that inventory management presents unique opportunities for the reliable application of deep reinforcement learning (DRL)<n>The first is Hindsight Differentiable Policy Optimization (HDPO), which uses pathwise gradients from offline counterfactual simulations to directly and efficiently optimize policy performance.<n>We exploit Graph Neural Networks (GNNs) as a natural inductive bias for encoding supply chain structure, demonstrate that they can represent optimal and near-optimal policies in two theoretical settings, and empirically show that they reduce data requirements across six diverse inventory problems.
arXiv Detail & Related papers (2023-06-20T02:58:25Z)
Multilevel-in-Layer Training for Deep Neural Network Regression [1.6185544531149159]
We present a multilevel regularization strategy that constructs and trains a hierarchy of neural networks. We experimentally show with PDE regression problems that our multilevel training approach is an effective regularizer.
arXiv Detail & Related papers (2022-11-11T23:53:46Z)
Resource Allocation via Model-Free Deep Learning in Free Space Optical Communications [119.81868223344173]
The paper investigates the general problem of resource allocation for mitigating channel fading effects in Free Space Optical (FSO) communications. Under this framework, we propose two algorithms that solve FSO resource allocation problems.
arXiv Detail & Related papers (2020-07-27T17:38:51Z)
Deep Adaptive Inference Networks for Single Image Super-Resolution [72.7304455761067]
Single image super-resolution (SISR) has witnessed tremendous progress in recent years owing to the deployment of deep convolutional neural networks (CNNs) In this paper, we take a step forward to address this issue by leveraging the adaptive inference networks for deep SISR (AdaDSR) Our AdaDSR involves an SISR model as backbone and a lightweight adapter module which takes image features and resource constraint as input and predicts a map of local network depth.
arXiv Detail & Related papers (2020-04-08T10:08:20Z)
Deep Reinforcement Learning with Robust and Smooth Policy [90.78795857181727]
We propose to learn a smooth policy that behaves smoothly with respect to states. We develop a new framework -- textbfSmooth textbfRegularized textbfReinforcement textbfLearning ($textbfSR2textbfL$), where the policy is trained with smoothness-inducing regularization. Such regularization effectively constrains the search space, and enforces smoothness in the learned policy.
arXiv Detail & Related papers (2020-03-21T00:10:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.