SafeSlice: Enabling SLA-Compliant O-RAN Slicing via Safe Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2503.12753v1
- Date: Mon, 17 Mar 2025 02:41:49 GMT
- Title: SafeSlice: Enabling SLA-Compliant O-RAN Slicing via Safe Deep Reinforcement Learning
- Authors: Ahmad M. Nagib, Hatem Abou-Zeid, Hossam S. Hassanein,
- Abstract summary: Deep reinforcement learning (DRL)-based slicing policies face challenges in physical systems such as open radio access networks (O-RANs)<n>These policies often lack safety guarantees to ensure compliance with service level agreements (SLAs)<n>We propose SafeSlice to address both the cumulative (trajectory-wise) and instantaneous (state-wise) latency constraints of O-RAN slices.
- Score: 20.344810727033327
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning (DRL)-based slicing policies have shown significant success in simulated environments but face challenges in physical systems such as open radio access networks (O-RANs) due to simulation-to-reality gaps. These policies often lack safety guarantees to ensure compliance with service level agreements (SLAs), such as the strict latency requirements of immersive applications. As a result, a deployed DRL slicing agent may make resource allocation (RA) decisions that degrade system performance, particularly in previously unseen scenarios. Real-world immersive applications require maintaining SLA constraints throughout deployment to prevent risky DRL exploration. In this paper, we propose SafeSlice to address both the cumulative (trajectory-wise) and instantaneous (state-wise) latency constraints of O-RAN slices. We incorporate the cumulative constraints by designing a sigmoid-based risk-sensitive reward function that reflects the slices' latency requirements. Moreover, we build a supervised learning cost model as part of a safety layer that projects the slicing agent's RA actions to the nearest safe actions, fulfilling instantaneous constraints. We conduct an exhaustive experiment that supports multiple services, including real virtual reality (VR) gaming traffic, to investigate the performance of SafeSlice under extreme and changing deployment conditions. SafeSlice achieves reductions of up to 83.23% in average cumulative latency, 93.24% in instantaneous latency violations, and 22.13% in resource consumption compared to the baselines. The results also indicate SafeSlice's robustness to changing the threshold configurations of latency constraints, a vital deployment scenario that will be realized by the O-RAN paradigm to empower mobile network operators (MNOs).
Related papers
- LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning [61.594212398272184]
Low-Rank Extrapolation (LoX) improves robustness against benign and malicious fine-tuning attacks.<n>LoX leads to 11% to 54% absolute reductions in attack success rates.
arXiv Detail & Related papers (2025-06-18T16:30:02Z) - Feasibility-Aware Pessimistic Estimation: Toward Long-Horizon Safety in Offline RL [14.767273209148545]
We propose a novel framework Feasibility-Aware offline Safe Reinforcement Learning with CVAE-based Pessimism (FASP)<n>We employ Hamilton-Jacobi (H-J) reachability analysis to generate reliable safety labels.<n>We also utilize pessimistic estimation methods to estimate the Q-value of reward and cost.
arXiv Detail & Related papers (2025-05-13T02:32:49Z) - Reward-Safety Balance in Offline Safe RL via Diffusion Regularization [16.5825143820431]
Constrained reinforcement learning (RL) seeks high-performance policies under safety constraints.<n>We propose Diffusion-Regularized Constrained Offline Reinforcement Learning (DRCORL)<n>DRCORL first uses a diffusion model to capture the behavioral policy from offline data and then extracts a simplified policy to enable efficient inference.
arXiv Detail & Related papers (2025-02-18T00:00:03Z) - Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states.
In safety-critical domains, such behaviors could lead to disastrous outcomes.
We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z) - Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion
Model [23.93820548551533]
We propose FISOR (FeasIbility-guided Safe Offline RL), which allows safety constraint adherence, reward, and offline policy learning.
In FISOR, the translated optimal policy for the translated optimization problem can be derived in a special form of weighted behavior cloning.
We show that FISOR is the only method that can guarantee safety satisfaction in all tasks, while achieving top returns in most tasks.
arXiv Detail & Related papers (2024-01-19T14:05:09Z) - How Does Forecasting Affect the Convergence of DRL Techniques in O-RAN
Slicing? [20.344810727033327]
We propose a novel forecasting-aided DRL approach and its respective O-RAN practical deployment workflow to enhance DRL convergence.
Our approach shows up to 22.8%, 86.3%, and 300% improvements in the average initial reward value, convergence rate, and number of converged scenarios respectively.
arXiv Detail & Related papers (2023-09-01T14:30:04Z) - Safety Margins for Reinforcement Learning [53.10194953873209]
We show how to leverage proxy criticality metrics to generate safety margins.
We evaluate our approach on learned policies from APE-X and A3C within an Atari environment.
arXiv Detail & Related papers (2023-07-25T16:49:54Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Online Safety Property Collection and Refinement for Safe Deep
Reinforcement Learning in Mapless Navigation [79.89605349842569]
We introduce the Collection and Refinement of Online Properties (CROP) framework to design properties at training time.
CROP employs a cost signal to identify unsafe interactions and use them to shape safety properties.
We evaluate our approach in several robotic mapless navigation tasks and demonstrate that the violation metric computed with CROP allows higher returns and lower violations over previous Safe DRL approaches.
arXiv Detail & Related papers (2023-02-13T21:19:36Z) - FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations [52.85536740465277]
FIRE is a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment.
We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function.
We show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
arXiv Detail & Related papers (2022-09-28T19:49:39Z) - Safe Exploration Incurs Nearly No Additional Sample Complexity for
Reward-free RL [43.672794342894946]
Reward-free reinforcement learning (RF-RL) relies on random action-taking to explore the unknown environment without any reward feedback information.
It remains unclear how such safe exploration requirement would affect the corresponding sample complexity in order to achieve the desired optimality of the obtained policy in planning.
We propose a unified Safe reWard-frEe ExploraTion (SWEET) framework, and develop algorithms coined Tabular-SWEET and Low-rank-SWEET, respectively.
arXiv Detail & Related papers (2022-06-28T15:00:45Z) - SAUTE RL: Almost Surely Safe Reinforcement Learning Using State
Augmentation [63.25418599322092]
Satisfying safety constraints almost surely (or with probability one) can be critical for deployment of Reinforcement Learning (RL) in real-life applications.
We address the problem by introducing Safety Augmented Markov Decision Processes (MDPs)
We show that Saute MDP allows to view Safe augmentation problem from a different perspective enabling new features.
arXiv Detail & Related papers (2022-02-14T08:57:01Z) - Lyapunov-based uncertainty-aware safe reinforcement learning [0.0]
InReinforcement learning (RL) has shown a promising performance in learning optimal policies for a variety of sequential decision-making tasks.
In many real-world RL problems, besides optimizing the main objectives, the agent is expected to satisfy a certain level of safety.
We propose a Lyapunov-based uncertainty-aware safe RL model to address these limitations.
arXiv Detail & Related papers (2021-07-29T13:08:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.