Goal-oriented Transmission Scheduling: Structure-guided DRL with a Unified Dual On-policy and Off-policy Approach
- URL: http://arxiv.org/abs/2501.11921v1
- Date: Tue, 21 Jan 2025 06:49:06 GMT
- Title: Goal-oriented Transmission Scheduling: Structure-guided DRL with a Unified Dual On-policy and Off-policy Approach
- Authors: Jiazheng Chen, Wanchun Liu,
- Abstract summary: We derive structural properties of the optimal solution to the goal-oriented scheduling problem, incorporating Age of Information (AoI) and channel states.
We propose the structure-guided unified dual on-off policy DRL (SUDO-DRL), a hybrid algorithm that combines the stability of on-policy training with the sample efficiency of off-policy methods.
Numerical results show SUDO-DRL improves system performance by up to 45% and reduces convergence time by 40% compared to state-of-the-art methods.
- Score: 3.6509749032112753
- License:
- Abstract: Goal-oriented communications prioritize application-driven objectives over data accuracy, enabling intelligent next-generation wireless systems. Efficient scheduling in multi-device, multi-channel systems poses significant challenges due to high-dimensional state and action spaces. We address these challenges by deriving key structural properties of the optimal solution to the goal-oriented scheduling problem, incorporating Age of Information (AoI) and channel states. Specifically, we establish the monotonicity of the optimal state value function (a measure of long-term system performance) w.r.t. channel states and prove its asymptotic convexity w.r.t. AoI states. Additionally, we derive the monotonicity of the optimal policy w.r.t. channel states, advancing the theoretical framework for optimal scheduling. Leveraging these insights, we propose the structure-guided unified dual on-off policy DRL (SUDO-DRL), a hybrid algorithm that combines the stability of on-policy training with the sample efficiency of off-policy methods. Through a novel structural property evaluation framework, SUDO-DRL enables effective and scalable training, addressing the complexities of large-scale systems. Numerical results show SUDO-DRL improves system performance by up to 45% and reduces convergence time by 40% compared to state-of-the-art methods. It also effectively handles scheduling in much larger systems, where off-policy DRL fails and on-policy benchmarks exhibit significant performance loss, demonstrating its scalability and efficacy in goal-oriented communications.
Related papers
- Latent feedback control of distributed systems in multiple scenarios through deep learning-based reduced order models [3.5161229331588095]
Continuous monitoring and real-time control of high-dimensional distributed systems are crucial in applications to ensure a desired physical behavior.
Traditional feedback control design that relies on full-order models fails to meet these requirements due to the delay in the control computation.
We propose a real-time closed-loop control strategy enhanced by nonlinear non-intrusive Deep Learning-based Reduced Order Models (DL-ROMs)
arXiv Detail & Related papers (2024-12-13T08:04:21Z) - Event-Triggered Reinforcement Learning Based Joint Resource Allocation for Ultra-Reliable Low-Latency V2X Communications [10.914558012458425]
6G-enabled vehicular networks face the challenge ensuring low-latency communication (URLLC) for delivering safety-critical information in a timely manner.
Traditional resource allocation schemes for vehicle-to-everything (V2X) communication systems rely on traditional decoding-based algorithms.
arXiv Detail & Related papers (2024-07-18T23:55:07Z) - Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy.
The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms.
We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z) - Optimization Theory Based Deep Reinforcement Learning for Resource
Allocation in Ultra-Reliable Wireless Networked Control Systems [10.177917426690701]
This paper introduces a novel optimization theory based deep reinforcement learning (DRL) framework for the joint design of controller and communication systems.
The objective of minimum power consumption is targeted while satisfying the schedulability and rate constraints of the communication system.
arXiv Detail & Related papers (2023-11-28T15:49:29Z) - Structure-Enhanced DRL for Optimal Transmission Scheduling [43.801422320012286]
This paper focuses on the transmission scheduling problem of a remote estimation system.
We develop a structure-enhanced deep reinforcement learning framework for optimal scheduling of the system.
In particular, we propose a structure-enhanced action selection method, which tends to select actions that obey the policy structure.
arXiv Detail & Related papers (2022-12-24T10:18:38Z) - Structure-Enhanced Deep Reinforcement Learning for Optimal Transmission
Scheduling [47.29474858956844]
We develop a structure-enhanced deep reinforcement learning framework for optimal scheduling of a multi-sensor remote estimation system.
In particular, we propose a structure-enhanced action selection method, which tends to select actions that obey the policy structure.
Our numerical results show that the proposed structure-enhanced DRL algorithms can save the training time by 50% and reduce the remote estimation MSE by 10% to 25%.
arXiv Detail & Related papers (2022-11-20T00:13:35Z) - Age of Semantics in Cooperative Communications: To Expedite Simulation
Towards Real via Offline Reinforcement Learning [53.18060442931179]
We propose the age of semantics (AoS) for measuring semantics freshness of status updates in a cooperative relay communication system.
We derive an online deep actor-critic (DAC) learning scheme under the on-policy temporal difference learning framework.
We then put forward a novel offline DAC scheme, which estimates the optimal control policy from a previously collected dataset.
arXiv Detail & Related papers (2022-09-19T11:55:28Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - Online Reinforcement Learning Control by Direct Heuristic Dynamic
Programming: from Time-Driven to Event-Driven [80.94390916562179]
Time-driven learning refers to the machine learning method that updates parameters in a prediction model continuously as new data arrives.
It is desirable to prevent the time-driven dHDP from updating due to insignificant system event such as noise.
We show how the event-driven dHDP algorithm works in comparison to the original time-driven dHDP.
arXiv Detail & Related papers (2020-06-16T05:51:25Z) - Optimization-driven Deep Reinforcement Learning for Robust Beamforming
in IRS-assisted Wireless Communications [54.610318402371185]
Intelligent reflecting surface (IRS) is a promising technology to assist downlink information transmissions from a multi-antenna access point (AP) to a receiver.
We minimize the AP's transmit power by a joint optimization of the AP's active beamforming and the IRS's passive beamforming.
We propose a deep reinforcement learning (DRL) approach that can adapt the beamforming strategies from past experiences.
arXiv Detail & Related papers (2020-05-25T01:42:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.