Blind VQA on 360{\deg} Video via Progressively Learning from Pixels,
Frames and Video
- URL: http://arxiv.org/abs/2111.09503v1
- Date: Thu, 18 Nov 2021 03:45:13 GMT
- Title: Blind VQA on 360{\deg} Video via Progressively Learning from Pixels,
Frames and Video
- Authors: Li Yang, Mai Xu, Shengxi Li, Yichen Guo, Zulin Wang
- Abstract summary: Blind visual quality assessment (BVQA) on 360textdegree video plays a key role in optimizing immersive multimedia systems.
In this paper, we take into account the progressive paradigm of human perception towards spherical video quality.
We propose a novel BVQA approach (namely ProVQA) for 360textdegree video via progressively learning from pixels, frames and video.
- Score: 66.57045901742922
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Blind visual quality assessment (BVQA) on 360{\textdegree} video plays a key
role in optimizing immersive multimedia systems. When assessing the quality of
360{\textdegree} video, human tends to perceive its quality degradation from
the viewport-based spatial distortion of each spherical frame to motion
artifact across adjacent frames, ending with the video-level quality score,
i.e., a progressive quality assessment paradigm. However, the existing BVQA
approaches for 360{\textdegree} video neglect this paradigm. In this paper, we
take into account the progressive paradigm of human perception towards
spherical video quality, and thus propose a novel BVQA approach (namely ProVQA)
for 360{\textdegree} video via progressively learning from pixels, frames and
video. Corresponding to the progressive learning of pixels, frames and video,
three sub-nets are designed in our ProVQA approach, i.e., the spherical
perception aware quality prediction (SPAQ), motion perception aware quality
prediction (MPAQ) and multi-frame temporal non-local (MFTN) sub-nets. The SPAQ
sub-net first models the spatial quality degradation based on spherical
perception mechanism of human. Then, by exploiting motion cues across adjacent
frames, the MPAQ sub-net properly incorporates motion contextual information
for quality assessment on 360{\textdegree} video. Finally, the MFTN sub-net
aggregates multi-frame quality degradation to yield the final quality score,
via exploring long-term quality correlation from multiple frames. The
experiments validate that our approach significantly advances the
state-of-the-art BVQA performance on 360{\textdegree} video over two datasets,
the code of which has been public in
\url{https://github.com/yanglixiaoshen/ProVQA.}
Related papers
- Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model [54.69882562863726]
We try to systemically investigate the AIGC-VQA problem from both subjective and objective quality assessment perspectives.
We evaluate the perceptual quality of AIGC videos from three dimensions: spatial quality, temporal quality, and text-to-video alignment.
We propose a Unify Generated Video Quality assessment (UGVQ) model to comprehensively and accurately evaluate the quality of AIGC videos.
arXiv Detail & Related papers (2024-07-31T07:54:26Z) - CLIPVQA:Video Quality Assessment via CLIP [56.94085651315878]
We propose an efficient CLIP-based Transformer method for the VQA problem ( CLIPVQA)
The proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods.
arXiv Detail & Related papers (2024-07-06T02:32:28Z) - Capturing Co-existing Distortions in User-Generated Content for
No-reference Video Quality Assessment [9.883856205077022]
Video Quality Assessment (VQA) aims to predict the perceptual quality of a video.
VQA faces two under-estimated challenges unresolved in User Generated Content (UGC) videos.
We propose textitVisual Quality Transformer (VQT) to extract quality-related sparse features more efficiently.
arXiv Detail & Related papers (2023-07-31T16:29:29Z) - Towards Explainable In-the-Wild Video Quality Assessment: A Database and
a Language-Prompted Approach [52.07084862209754]
We collect over two million opinions on 4,543 in-the-wild videos on 13 dimensions of quality-related factors.
Specifically, we ask the subjects to label among a positive, a negative, and a neutral choice for each dimension.
These explanation-level opinions allow us to measure the relationships between specific quality factors and abstract subjective quality ratings.
arXiv Detail & Related papers (2023-05-22T05:20:23Z) - Zoom-VQA: Patches, Frames and Clips Integration for Video Quality
Assessment [14.728530703277283]
Video assessment (VQA) aims to simulate the human perception of video quality.
We decompose video into three levels: patch level, frame level, and clip level.
We propose Zoom-VQA architecture to perceive features at different levels.
arXiv Detail & Related papers (2023-04-13T12:18:15Z) - Evaluating Point Cloud from Moving Camera Videos: A No-Reference Metric [58.309735075960745]
This paper explores the way of dealing with point cloud quality assessment (PCQA) tasks via video quality assessment (VQA) methods.
We generate the captured videos by rotating the camera around the point clouds through several circular pathways.
We extract both spatial and temporal quality-aware features from the selected key frames and the video clips through using trainable 2D-CNN and pre-trained 3D-CNN models.
arXiv Detail & Related papers (2022-08-30T08:59:41Z) - Exploring the Effectiveness of Video Perceptual Representation in Blind
Video Quality Assessment [55.65173181828863]
We propose a temporal perceptual quality index (TPQI) to measure the temporal distortion by describing the graphic morphology of the representation.
Experiments show that TPQI is an effective way of predicting subjective temporal quality.
arXiv Detail & Related papers (2022-07-08T07:30:51Z) - Patch-VQ: 'Patching Up' the Video Quality Problem [0.9786690381850356]
No-reference (NR) perceptual video quality assessment (VQA) is a complex, unsolved, and important problem to social and streaming media applications.
Current NR models are limited in their prediction capabilities on real-world, "in-the-wild" video data.
We create the largest (by far) subjective video quality dataset, containing 39, 000 realworld distorted videos and 117, 000 space-time localized video patches.
arXiv Detail & Related papers (2020-11-27T03:46:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.