MADRL-Based Rate Adaptation for 360° Video Streaming with Multi-Viewpoint Prediction
- URL: http://arxiv.org/abs/2405.07759v2
- Date: Fri, 17 May 2024 23:21:14 GMT
- Title: MADRL-Based Rate Adaptation for 360° Video Streaming with Multi-Viewpoint Prediction
- Authors: Haopeng Wang, Zijian Long, Haiwei Dong, Abdulmotaleb El Saddik,
- Abstract summary: A key challenge of 360deg video playback is ensuring a high quality of experience (QoE) with limited network bandwidth.
Currently, most studies focus on tile-based adaptive (ABR) streaming based on single viewport prediction to reduce bandwidth consumption.
This paper first presents a multimodal spatial-temporal attention transformer to generate multiple viewpoint trajectories with their probabilities given a historical trajectory.
After that, a multi-agent deep reinforcement learning (MADRL)-based ABR algorithm utilizing multi-viewpoint prediction for 360deg video streaming is proposed.
- Score: 3.8611070161950916
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Over the last few years, 360{\deg} video traffic on the network has grown significantly. A key challenge of 360{\deg} video playback is ensuring a high quality of experience (QoE) with limited network bandwidth. Currently, most studies focus on tile-based adaptive bitrate (ABR) streaming based on single viewport prediction to reduce bandwidth consumption. However, the performance of models for single-viewpoint prediction is severely limited by the inherent uncertainty in head movement, which can not cope with the sudden movement of users very well. This paper first presents a multimodal spatial-temporal attention transformer to generate multiple viewpoint trajectories with their probabilities given a historical trajectory. The proposed method models viewpoint prediction as a classification problem and uses attention mechanisms to capture the spatial and temporal characteristics of input video frames and viewpoint trajectories for multi-viewpoint prediction. After that, a multi-agent deep reinforcement learning (MADRL)-based ABR algorithm utilizing multi-viewpoint prediction for 360{\deg} video streaming is proposed for maximizing different QoE objectives under various network conditions. We formulate the ABR problem as a decentralized partially observable Markov decision process (Dec-POMDP) problem and present a MAPPO algorithm based on centralized training and decentralized execution (CTDE) framework to solve the problem. The experimental results show that our proposed method improves the defined QoE metric by up to 85.5% compared to existing ABR methods.
Related papers
- Patch Spatio-Temporal Relation Prediction for Video Anomaly Detection [19.643936110623653]
Video Anomaly Detection (VAD) aims to identify abnormalities within a specific context and timeframe.
Recent deep learning-based VAD models have shown promising results by generating high-resolution frames.
We propose a self-supervised learning approach for VAD through an inter-patch relationship prediction task.
arXiv Detail & Related papers (2024-03-28T03:07:16Z) - Assessor360: Multi-sequence Network for Blind Omnidirectional Image
Quality Assessment [50.82681686110528]
Blind Omnidirectional Image Quality Assessment (BOIQA) aims to objectively assess the human perceptual quality of omnidirectional images (ODIs)
The quality assessment of ODIs is severely hampered by the fact that the existing BOIQA pipeline lacks the modeling of the observer's browsing process.
We propose a novel multi-sequence network for BOIQA called Assessor360, which is derived from the realistic multi-assessor ODI quality assessment procedure.
arXiv Detail & Related papers (2023-05-18T13:55:28Z) - Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP
Benchmark [23.872360763782037]
ASAP is the first benchmark to evaluate the online performance of vision-centric perception in autonomous driving.
We propose an annotation-extending pipeline to generate high-frame-rate labels for the 12Hz raw images.
In the ASAP benchmark, comprehensive experiment results reveal that the model rank alters under different constraints.
arXiv Detail & Related papers (2022-12-17T16:32:15Z) - Spherical Convolution empowered FoV Prediction in 360-degree Video
Multicast with Limited FoV Feedback [16.716422953229088]
Field of view (FoV) prediction is critical in 360-degree video multicast.
This paper proposes a spherical convolution-empowered FoV prediction method.
The experimental results show that the performance of the proposed method is better than other prediction methods.
arXiv Detail & Related papers (2022-01-29T08:32:19Z) - Crowd Counting via Perspective-Guided Fractional-Dilation Convolution [75.36662947203192]
This paper proposes a novel convolution neural network-based crowd counting method, termed Perspective-guided Fractional-Dilation Network (PFDNet)
By modeling the continuous scale variations, the proposed PFDNet is able to select the proper fractional dilation kernels for adapting to different spatial locations.
It significantly improves the flexibility of the state-of-the-arts that only consider the discrete representative scales.
arXiv Detail & Related papers (2021-07-08T07:57:00Z) - Self-Supervised Multi-Frame Monocular Scene Flow [61.588808225321735]
We introduce a multi-frame monocular scene flow network based on self-supervised learning.
We observe state-of-the-art accuracy among monocular scene flow methods based on self-supervised learning.
arXiv Detail & Related papers (2021-05-05T17:49:55Z) - Robust Unsupervised Video Anomaly Detection by Multi-Path Frame
Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design.
Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z) - Dynamic Inference: A New Approach Toward Efficient Video Action
Recognition [69.9658249941149]
Action recognition in videos has achieved great success recently, but it remains a challenging task due to the massive computational cost.
We propose a general dynamic inference idea to improve inference efficiency by leveraging the variation in the distinguishability of different videos.
arXiv Detail & Related papers (2020-02-09T11:09:56Z) - Non-Cooperative Game Theory Based Rate Adaptation for Dynamic Video
Streaming over HTTP [89.30855958779425]
Dynamic Adaptive Streaming over HTTP (DASH) has demonstrated to be an emerging and promising multimedia streaming technique.
We propose a novel algorithm to optimally allocate the limited export bandwidth of the server to multi-users to maximize their Quality of Experience (QoE) with fairness guaranteed.
arXiv Detail & Related papers (2019-12-27T01:19:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.