Learning Generalized Spatial-Temporal Deep Feature Representation for
No-Reference Video Quality Assessment
- URL: http://arxiv.org/abs/2012.13936v1
- Date: Sun, 27 Dec 2020 13:11:53 GMT
- Title: Learning Generalized Spatial-Temporal Deep Feature Representation for
No-Reference Video Quality Assessment
- Authors: Baoliang Chen, Lingyu Zhu, Guo Li, Hongfei Fan, and Shiqi Wang
- Abstract summary: We propose a no-reference video quality assessment method, aiming to achieve high-generalization capability in cross-content, -resolution and -frame rate quality prediction.
In particular, we evaluate the quality of a video by learning effective feature representations in spatial-temporal domain.
Experiments show that our method outperforms the state-of-the-art methods on cross-dataset settings.
- Score: 16.974008463660688
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we propose a no-reference video quality assessment method,
aiming to achieve high-generalization capability in cross-content, -resolution
and -frame rate quality prediction. In particular, we evaluate the quality of a
video by learning effective feature representations in spatial-temporal domain.
In the spatial domain, to tackle the resolution and content variations, we
impose the Gaussian distribution constraints on the quality features. The
unified distribution can significantly reduce the domain gap between different
video samples, resulting in a more generalized quality feature representation.
Along the temporal dimension, inspired by the mechanism of visual perception,
we propose a pyramid temporal aggregation module by involving the short-term
and long-term memory to aggregate the frame-level quality. Experiments show
that our method outperforms the state-of-the-art methods on cross-dataset
settings, and achieves comparable performance on intra-dataset configurations,
demonstrating the high-generalization capability of the proposed method.
Related papers
- Modular Blind Video Quality Assessment [33.657933680973194]
Blind video quality assessment (BVQA) plays a pivotal role in evaluating and improving the viewing experience of end-users across a wide range of video-based platforms and services.
In this paper, we propose a modular BVQA model and a method of training it to improve its modularity.
arXiv Detail & Related papers (2024-02-29T15:44:00Z) - Inflation with Diffusion: Efficient Temporal Adaptation for
Text-to-Video Super-Resolution [19.748048455806305]
We propose an efficient diffusion-based text-to-video super-resolution (SR) tuning approach.
We investigate different tuning approaches based on our inflated architecture and report trade-offs between computational costs and super-resolution quality.
arXiv Detail & Related papers (2024-01-18T22:25:16Z) - Video Quality Assessment Based on Swin TransformerV2 and Coarse to Fine
Strategy [16.436012370209845]
objective of non-reference quality assessment is to evaluate the quality of distorted video without access to high-definition references.
In this study, we introduce an enhanced spatial perception module, pre-trained on multiple image quality assessment datasets, and a lightweight temporal fusion module.
arXiv Detail & Related papers (2024-01-16T17:33:54Z) - Video Dynamics Prior: An Internal Learning Approach for Robust Video
Enhancements [83.5820690348833]
We present a framework for low-level vision tasks that does not require any external training data corpus.
Our approach learns neural modules by optimizing over a corrupted sequence, leveraging the weights of the coherence-temporal test and statistics internal statistics.
arXiv Detail & Related papers (2023-12-13T01:57:11Z) - Neighbourhood Representative Sampling for Efficient End-to-end Video
Quality Assessment [60.57703721744873]
The increased resolution of real-world videos presents a dilemma between efficiency and accuracy for deep Video Quality Assessment (VQA)
In this work, we propose a unified scheme, spatial-temporal grid mini-cube sampling (St-GMS) to get a novel type of sample, named fragments.
With fragments and FANet, the proposed efficient end-to-end FAST-VQA and FasterVQA achieve significantly better performance than existing approaches on all VQA benchmarks.
arXiv Detail & Related papers (2022-10-11T11:38:07Z) - Video Frame Interpolation Transformer [86.20646863821908]
We propose a Transformer-based video framework that allows content-aware aggregation weights and considers long-range dependencies with the self-attention operations.
To avoid the high computational cost of global self-attention, we introduce the concept of local attention into video.
In addition, we develop a multi-scale frame scheme to fully realize the potential of Transformers.
arXiv Detail & Related papers (2021-11-27T05:35:10Z) - Coherent Loss: A Generic Framework for Stable Video Segmentation [103.78087255807482]
We investigate how a jittering artifact degrades the visual quality of video segmentation results.
We propose a Coherent Loss with a generic framework to enhance the performance of a neural network against jittering artifacts.
arXiv Detail & Related papers (2020-10-25T10:48:28Z) - Temporal Context Aggregation for Video Retrieval with Contrastive
Learning [81.12514007044456]
We propose TCA, a video representation learning framework that incorporates long-range temporal information between frame-level features.
The proposed method shows a significant performance advantage (17% mAP on FIVR-200K) over state-of-the-art methods with video-level features.
arXiv Detail & Related papers (2020-08-04T05:24:20Z) - Capturing Video Frame Rate Variations via Entropic Differencing [63.749184706461826]
We propose a novel statistical entropic differencing method based on a Generalized Gaussian Distribution model.
Our proposed model correlates very well with subjective scores in the recently proposed LIVE-YT-HFR database.
arXiv Detail & Related papers (2020-06-19T22:16:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.