Related papers: Heterogeneous 360 Degree Videos in Metaverse: Differentiated Reinforcement Learning Approaches

Heterogeneous 360 Degree Videos in Metaverse: Differentiated Reinforcement Learning Approaches

URL: http://arxiv.org/abs/2308.04083v1
Date: Tue, 8 Aug 2023 06:47:16 GMT
Title: Heterogeneous 360 Degree Videos in Metaverse: Differentiated Reinforcement Learning Approaches
Authors: Wenhan Yu and Jun Zhao
Abstract summary: This paper presents a novel Quality of Service model for heterogeneous 360-degree videos with different requirements for frame rates and cybersickness. We propose a frame-slotted structure and conduct frame-wise optimization using self-designed differentiated deep reinforcement learning algorithms.
Score: 10.0580903923777
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Advanced video technologies are driving the development of the futuristic Metaverse, which aims to connect users from anywhere and anytime. As such, the use cases for users will be much more diverse, leading to a mix of 360-degree videos with two types: non-VR and VR 360-degree videos. This paper presents a novel Quality of Service model for heterogeneous 360-degree videos with different requirements for frame rates and cybersickness. We propose a frame-slotted structure and conduct frame-wise optimization using self-designed differentiated deep reinforcement learning algorithms. Specifically, we design two structures, Separate Input Differentiated Output (SIDO) and Merged Input Differentiated Output (MIDO), for this heterogeneous scenario. We also conduct comprehensive experiments to demonstrate their effectiveness.

Related papers

Omnidirectional Video Super-Resolution using Deep Learning [3.281128493853064]
The limited spatial resolution in 360deg videos does not allow for each degree of view to be represented with adequate pixels.<n>This paper proposes a novel deep learning model for 360deg Video Super-Resolution (360deg VSR) called Spherical Signal Super-resolution with a Proportioned optimisation (S3PO)<n>S3PO adopts recurrent modelling with an attention mechanism, unbound from conventional VSR techniques like alignment.
arXiv Detail & Related papers (2025-06-03T05:59:21Z)
Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos [64.10180665546237]
360deg videos offer a more complete perspective of our surroundings. Existing video models excel at producing standard videos, but their ability to generate full panoramic videos remains elusive. We develop a high-quality data filtering pipeline to curate pairwise training data and improve the quality of 360deg video generation. Experimental results demonstrate that our model can generate realistic and coherent 360deg videos from in-the-wild perspective video.
arXiv Detail & Related papers (2025-04-10T17:51:38Z)
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval [24.764393859378544]
We introduce Modality Auxiliary Concepts for Video Retrieval (MAC-VR) We propose to align modalities in a latent space, along with learning and aligning auxiliary latent concepts. We conduct extensive experiments on five diverse datasets.
arXiv Detail & Related papers (2025-04-02T10:56:01Z)
Optical-Flow Guided Prompt Optimization for Coherent Video Generation [51.430833518070145]
We propose a framework called MotionPrompt that guides the video generation process via optical flow. We optimize learnable token embeddings during reverse sampling steps by using gradients from a trained discriminator applied to random frame pairs. This approach allows our method to generate visually coherent video sequences that closely reflect natural motion dynamics, without compromising the fidelity of the generated content.
arXiv Detail & Related papers (2024-11-23T12:26:52Z)
360VFI: A Dataset and Benchmark for Omnidirectional Video Frame Interpolation [13.122586587748218]
This paper introduces the benchmark dataset, 360VFI, for Omnidirectional Video Frame Interpolation. We present a practical implementation that introduces a distortion prior from omnidirectional video into the network to modulate distortions.
arXiv Detail & Related papers (2024-07-19T06:50:24Z)
Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features [21.583246378475856]
We introduce an extensive video dataset designed specifically for AI-Generated Video Detection (GenVidDet) We also present the Dual-Branch 3D Transformer (DuB3D), an innovative and effective method for distinguishing between real and generated videos. DuB3D can distinguish between real and generated video content with 96.77% accuracy, and strong generalization capability even for unseen types.
arXiv Detail & Related papers (2024-05-24T08:26:04Z)
Exploring AIGC Video Quality: A Focus on Visual Harmony, Video-Text Consistency and Domain Distribution Gap [4.922783970210658]
We categorize the assessment of AIGC video quality into three dimensions: visual harmony, video-text consistency, and domain distribution gap. For each dimension, we design specific modules to provide a comprehensive quality assessment of AIGC videos. Our research identifies significant variations in visual quality, fluidity, and style among videos generated by different text-to-video models.
arXiv Detail & Related papers (2024-04-21T08:27:20Z)
Panoramic Vision Transformer for Saliency Detection in 360{\deg} Videos [48.54829780502176]
We present a new framework named Panoramic Vision Transformer (PAVER) We design the encoder using Vision Transformer with deformable convolution, which enables us to plug pretrained models from normal videos into our architecture without additional modules or finetuning. We demonstrate the utility of our saliency prediction model with the omnidirectional video quality assessment task in VQA-ODV, where we consistently improve performance without any form of supervision.
arXiv Detail & Related papers (2022-09-19T12:23:34Z)
Condensing a Sequence to One Informative Frame for Video Recognition [113.3056598548736]
This paper studies a two-step alternative that first condenses the video sequence to an informative "frame" A valid question is how to define "useful information" and then distill from a sequence down to one synthetic frame. IFS consistently demonstrates evident improvements on image-based 2D networks and clip-based 3D networks.
arXiv Detail & Related papers (2022-01-11T16:13:43Z)
Blind VQA on 360{\deg} Video via Progressively Learning from Pixels, Frames and Video [66.57045901742922]
Blind visual quality assessment (BVQA) on 360textdegree video plays a key role in optimizing immersive multimedia systems. In this paper, we take into account the progressive paradigm of human perception towards spherical video quality. We propose a novel BVQA approach (namely ProVQA) for 360textdegree video via progressively learning from pixels, frames and video.
arXiv Detail & Related papers (2021-11-18T03:45:13Z)
Is Space-Time Attention All You Need for Video Understanding? [50.78676438502343]
We present a convolution-free approach to built exclusively on self-attention over space and time. "TimeSformer" adapts the standard Transformer architecture to video by enabling feature learning from a sequence of frame-level patches. TimeSformer achieves state-of-the-art results on several major action recognition benchmarks.
arXiv Detail & Related papers (2021-02-09T19:49:33Z)
ATSal: An Attention Based Architecture for Saliency Prediction in 360 Videos [5.831115928056554]
This paper proposes ATSal, a novel attention based (head-eye) saliency model for 360degree videos. We compare the proposed approach to other state-of-the-art saliency models on two datasets: Salient360! and VR-EyeTracking. Experimental results on over 80 ODV videos (75K+ frames) show that the proposed method outperforms the existing state-of-the-art.
arXiv Detail & Related papers (2020-11-20T19:19:48Z)
Non-Adversarial Video Synthesis with Learned Priors [53.26777815740381]
We focus on the problem of generating videos from latent noise vectors, without any reference input frames. We develop a novel approach that jointly optimize the input latent space, the weights of a recurrent neural network and a generator through non-adversarial learning. Our approach generates superior quality videos compared to the existing state-of-the-art methods.
arXiv Detail & Related papers (2020-03-21T02:57:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.