Accelerating Deep Reinforcement Learning With the Aid of Partial Model:
Energy-Efficient Predictive Video Streaming
- URL: http://arxiv.org/abs/2003.09708v2
- Date: Thu, 5 Nov 2020 01:30:00 GMT
- Title: Accelerating Deep Reinforcement Learning With the Aid of Partial Model:
Energy-Efficient Predictive Video Streaming
- Authors: Dong Liu, Jianyu Zhao, Chenyang Yang, Lajos Hanzo
- Abstract summary: Predictive power allocation is conceived for energy-efficient video streaming over mobile networks using deep reinforcement learning.
To handle the continuous state and action spaces, we resort to deep deterministic policy gradient (DDPG) algorithm.
Our simulation results show that the proposed policies converge to the optimal policy that is derived based on perfect large-scale channel prediction.
- Score: 97.75330397207742
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predictive power allocation is conceived for energy-efficient video streaming
over mobile networks using deep reinforcement learning. The goal is to minimize
the accumulated energy consumption of each base station over a complete video
streaming session under the constraint that avoids video playback
interruptions. To handle the continuous state and action spaces, we resort to
deep deterministic policy gradient (DDPG) algorithm for solving the formulated
problem. In contrast to previous predictive power allocation policies that
first predict future information with historical data and then optimize the
power allocation based on the predicted information, the proposed policy
operates in an on-line and end-to-end manner. By judiciously designing the
action and state that only depend on slowly-varying average channel gains, we
reduce the signaling overhead between the edge server and the base stations,
and make it easier to learn a good policy. To further avoid playback
interruption throughout the learning process and improve the convergence speed,
we exploit the partially known model of the system dynamics by integrating the
concepts of safety layer, post-decision state, and virtual experiences into the
basic DDPG algorithm. Our simulation results show that the proposed policies
converge to the optimal policy that is derived based on perfect large-scale
channel prediction and outperform the first-predict-then-optimize policy in the
presence of prediction errors. By harnessing the partially known model, the
convergence speed can be dramatically improved.
Related papers
- Causal Context Adjustment Loss for Learned Image Compression [72.7300229848778]
In recent years, learned image compression (LIC) technologies have surpassed conventional methods notably in terms of rate-distortion (RD) performance.
Most present techniques are VAE-based with an autoregressive entropy model, which obviously promotes the RD performance by utilizing the decoded causal context.
In this paper, we make the first attempt in investigating the way to explicitly adjust the causal context with our proposed Causal Context Adjustment loss.
arXiv Detail & Related papers (2024-10-07T09:08:32Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - Acceleration in Policy Optimization [50.323182853069184]
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates.
We define optimism as predictive modelling of the future behavior of a policy, and adaptivity as taking immediate and anticipatory corrective actions to mitigate errors from overshooting predictions or delayed responses to change.
We design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task.
arXiv Detail & Related papers (2023-06-18T15:50:57Z) - Learning Robust Policies for Generalized Debris Capture with an
Automated Tether-Net System [2.0429716172112617]
This paper presents a reinforcement learning framework that integrates a policy optimization approach with net dynamics simulations.
A state transition model is considered in order to incorporate synthetic uncertainties in state estimation and launch actuation.
The trained policy demonstrates capture performance close to that obtained with reliability-based optimization run over an individual scenario.
arXiv Detail & Related papers (2022-01-11T20:09:05Z) - Distributed Reinforcement Learning for Privacy-Preserving Dynamic Edge
Caching [91.50631418179331]
A privacy-preserving distributed deep policy gradient (P2D3PG) is proposed to maximize the cache hit rates of devices in the MEC networks.
We convert the distributed optimizations into model-free Markov decision process problems and then introduce a privacy-preserving federated learning method for popularity prediction.
arXiv Detail & Related papers (2021-10-20T02:48:27Z) - A Reinforcement-Learning-Based Energy-Efficient Framework for Multi-Task
Video Analytics Pipeline [16.72264118199915]
Video analytics pipelines are energy-intensive due to high data rates and reliance on complex inference algorithms.
We propose an adaptive-resolution optimization framework to minimize the energy use of multi-task video analytics pipelines.
Our framework has significantly surpassed all baseline methods of similar accuracy on the YouTube-VIS dataset.
arXiv Detail & Related papers (2021-04-09T15:44:06Z) - Hybrid Policy Learning for Energy-Latency Tradeoff in MEC-Assisted VR
Video Service [35.31115954442725]
We consider delivering the wireless multi-tile VR video service over a mobile edge computing network.
We first cast the time-varying view popularity as a model-free Markov chain.
A hybrid policy is then implemented to coordinate the dynamic caching replacement and the deterministic offloading.
arXiv Detail & Related papers (2021-04-02T13:17:11Z) - Recurrent Model Predictive Control [19.047059454849897]
We propose an off-line algorithm, called Recurrent Model Predictive Control (RMPC), to solve general nonlinear finite-horizon optimal control problems.
Our algorithm employs a recurrent function to approximate the optimal policy, which maps the system states and reference values directly to the control inputs.
arXiv Detail & Related papers (2021-02-23T15:01:36Z) - Iterative Amortized Policy Optimization [147.63129234446197]
Policy networks are a central feature of deep reinforcement learning (RL) algorithms for continuous control.
From the variational inference perspective, policy networks are a form of textitamortized optimization, optimizing network parameters rather than the policy distributions directly.
We demonstrate that iterative amortized policy optimization, yields performance improvements over direct amortization on benchmark continuous control tasks.
arXiv Detail & Related papers (2020-10-20T23:25:42Z) - TTPP: Temporal Transformer with Progressive Prediction for Efficient
Action Anticipation [46.28067541184604]
Video action anticipation aims to predict future action categories from observed frames.
Current state-of-the-art approaches mainly resort to recurrent neural networks to encode history information into hidden states.
This paper proposes a simple yet efficient Temporal Transformer with Progressive Prediction framework.
arXiv Detail & Related papers (2020-03-07T07:59:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.