Context-aware Constrained Reinforcement Learning Based Energy-Efficient Power Scheduling for Non-stationary XR Data Traffic
- URL: http://arxiv.org/abs/2503.09391v1
- Date: Wed, 12 Mar 2025 13:37:19 GMT
- Title: Context-aware Constrained Reinforcement Learning Based Energy-Efficient Power Scheduling for Non-stationary XR Data Traffic
- Authors: Kexuan Wang, An Liu,
- Abstract summary: In downlink transmission, an efficient power scheduling (EEPS) is essential for conserving power resource while delivering data packets within hard-latency.<n>Traditional algorithms show promise in EEPS but struggle with context non-stationary data constraints.<n>To overcome these challenges, this paper models overcome these challenges with a proposed context-aware constrained reinforcement learning algorithm.
- Score: 8.526578240549794
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In XR downlink transmission, energy-efficient power scheduling (EEPS) is essential for conserving power resource while delivering large data packets within hard-latency constraints. Traditional constrained reinforcement learning (CRL) algorithms show promise in EEPS but still struggle with non-convex stochastic constraints, non-stationary data traffic, and sparse delayed packet dropout feedback (rewards) in XR. To overcome these challenges, this paper models the EEPS in XR as a dynamic parameter-constrained Markov decision process (DP-CMDP) with a varying transition function linked to the non-stationary data traffic and solves it by a proposed context-aware constrained reinforcement learning (CACRL) algorithm, which consists of a context inference (CI) module and a CRL module. The CI module trains an encoder and multiple potential networks to characterize the current transition function and reshape the packet dropout rewards according to the context, transforming the original DP-CMDP into a general CMDP with immediate dense rewards. The CRL module employs a policy network to make EEPS decisions under this CMDP and optimizes the policy using a constrained stochastic successive convex approximation (CSSCA) method, which is better suited for non-convex stochastic constraints. Finally, theoretical analyses provide deep insights into the CADAC algorithm, while extensive simulations demonstrate that it outperforms advanced baselines in both power conservation and satisfying packet dropout constraints.
Related papers
- Safe Multi-Agent Deep Reinforcement Learning for Privacy-Aware Edge-Device Collaborative DNN Inference [8.14391361533752]
This paper proposes a privacy-aware collaborative inference framework, in which adaptive model partitioning is performed across edge devices and servers.<n>We formulate the joint problem as a Constrained Markov Decision Process (CMDP) that integrates model deployment, user-server association, model partitioning, and resource allocation.<n>We show that HC-MAPPO-L consistently satisfies stringent delay constraints while achieving a superior balance among energy consumption and privacy cost.
arXiv Detail & Related papers (2026-02-23T11:33:52Z) - Multi-Objective Reward and Preference Optimization: Theory and Algorithms [3.316593788543852]
This thesis develops theoretical frameworks and algorithms that advance constrained reinforcement learning (RL) across control, preference learning, and alignment of large language models.<n>ACPO, e-COP, warmPref-PS, PSPL, and MOPO advance RL across average-cost, episodic, and preference-driven paradigms.<n> Collectively, the thesis unifies RL across average-cost, episodic, and preference-driven paradigms, delivering theoretical advances and practical tools for safe and aligned decision-making.
arXiv Detail & Related papers (2025-12-11T12:51:21Z) - A Flexible Multi-Agent Deep Reinforcement Learning Framework for Dynamic Routing and Scheduling of Latency-Critical Services [18.675072317045466]
Most existing network control solutions target only average delay performance, falling short of providing strict End-to-End (E2E) peak latency guarantees.<n>This paper addresses the challenge of reliably delivering packets within application-imposed deadlines by leveraging recent advancements in Multi-Agent Deep Reinforcement Learning (MA-DRL)<n>We present a novel MA-DRL network control framework that leverages a centralized routing and distributed scheduling architecture.
arXiv Detail & Related papers (2025-10-13T15:38:10Z) - Generative Sequential Notification Optimization via Multi-Objective Decision Transformers [9.542285455613927]
We present a Decision Transformer based framework that reframes policy learning as return-conditioned supervised learning.<n>Our contributions include a real-world comparison with CQL, a multi-reward design suitable for non-episodic tasks, and a quantile regression approach to return-to-go conditioning.
arXiv Detail & Related papers (2025-09-02T16:09:02Z) - RCCDA: Adaptive Model Updates in the Presence of Concept Drift under a Constrained Resource Budget [19.391900930310253]
Real-time machine learning algorithms are often faced with the challenge of adapting models to concept drift.<n>Existing solutions often depend on drift-detection methods that produce high computational overhead for resource-constrained environments.<n>We propose RCCDA: a dynamic model update policy that optimize ML training dynamics while ensuring strict compliance to predefined resource constraints.
arXiv Detail & Related papers (2025-05-30T02:49:42Z) - PLS-Assisted Offloading for Edge Computing-Enabled Post-Quantum Security in Resource-Constrained Devices [13.649969611527746]
Post-quantum cryptography (PQC) standards have become imperative for resource-constrained devices (RCDs) in the Internet of Things (IoT)
We propose an edge computing-enabled PQC framework that leverages a physical-layer security (PLS)-assisted offloading strategy.
Our framework integrates two PLS techniques: offloading RCDs employ wiretap coding to secure data transmission, while non-offloading RCDs serve as friendly jammers by broadcasting artificial noise.
arXiv Detail & Related papers (2025-04-13T05:14:17Z) - Representation-based Reward Modeling for Efficient Safety Alignment of Large Language Model [84.00480999255628]
Reinforcement Learning algorithms for safety alignment of Large Language Models (LLMs) encounter the challenge of distribution shift.
Current approaches typically address this issue through online sampling from the target policy.
We propose a new framework that leverages the model's intrinsic safety judgment capability to extract reward signals.
arXiv Detail & Related papers (2025-03-13T06:40:34Z) - Efficiently Training Deep-Learning Parametric Policies using Lagrangian Duality [55.06411438416805]
Constrained Markov Decision Processes (CMDPs) are critical in many high-stakes applications.<n>This paper introduces a novel approach, Two-Stage Deep Decision Rules (TS- DDR) to efficiently train parametric actor policies.<n>It is shown to enhance solution quality and to reduce computation times by several orders of magnitude when compared to current state-of-the-art methods.
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - Closed-form congestion control via deep symbolic regression [1.5961908901525192]
Reinforcement Learning (RL) algorithms can handle challenges in ultra-low-latency and high throughput scenarios.
The adoption of neural network models in real deployments still poses some challenges regarding real-time inference and interpretability.
This paper proposes a methodology to deal with such challenges while maintaining the performance and generalization capabilities.
arXiv Detail & Related papers (2024-03-28T14:31:37Z) - Lyapunov-Driven Deep Reinforcement Learning for Edge Inference Empowered
by Reconfigurable Intelligent Surfaces [30.1512069754603]
We propose a novel algorithm for energy-efficient, low-latency, accurate inference at the wireless edge.
We consider a scenario where new data are continuously generated/collected by a set of devices and are handled through a dynamic queueing system.
arXiv Detail & Related papers (2023-05-18T12:46:42Z) - Distributed-Training-and-Execution Multi-Agent Reinforcement Learning
for Power Control in HetNet [48.96004919910818]
We propose a multi-agent deep reinforcement learning (MADRL) based power control scheme for the HetNet.
To promote cooperation among agents, we develop a penalty-based Q learning (PQL) algorithm for MADRL systems.
In this way, an agent's policy can be learned by other agents more easily, resulting in a more efficient collaboration process.
arXiv Detail & Related papers (2022-12-15T17:01:56Z) - Fair and Efficient Distributed Edge Learning with Hybrid Multipath TCP [62.81300791178381]
The bottleneck of distributed edge learning over wireless has shifted from computing to communication.
Existing TCP-based data networking schemes for DEL are application-agnostic and fail to deliver adjustments according to application layer requirements.
We develop a hybrid multipath TCP (MP TCP) by combining model-based and deep reinforcement learning (DRL) based MP TCP for DEL.
arXiv Detail & Related papers (2022-11-03T09:08:30Z) - Deep Reinforcement Learning for Wireless Scheduling in Distributed Networked Control [37.10638636086814]
We consider a joint uplink and downlink scheduling problem of a fully distributed wireless control system (WNCS) with a limited number of frequency channels.
We develop a deep reinforcement learning (DRL) based framework for solving it.
To tackle the challenges of a large action space in DRL, we propose novel action space reduction and action embedding methods.
arXiv Detail & Related papers (2021-09-26T11:27:12Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Proactive and AoI-aware Failure Recovery for Stateful NFV-enabled
Zero-Touch 6G Networks: Model-Free DRL Approach [0.0]
We propose a model-free deep reinforcement learning (DRL)-based proactive failure recovery framework called zero-touch PFR (ZT-PFR)
ZT-PFR is for the embedded stateful virtual network functions (VNFs) in network function virtualization (NFV) enabled networks.
arXiv Detail & Related papers (2021-02-02T21:40:35Z) - Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned
Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network.
We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.