DCVQE: A Hierarchical Transformer for Video Quality Assessment
- URL: http://arxiv.org/abs/2210.04377v1
- Date: Mon, 10 Oct 2022 00:22:16 GMT
- Title: DCVQE: A Hierarchical Transformer for Video Quality Assessment
- Authors: Zutong Li, Lei Yang
- Abstract summary: We propose a Divide and Conquer Video Quality Estimator (DCVQE) for NR-VQA.
We call this hierarchical combination of Transformers as a Divide and Conquer Transformer (DCTr) layer.
Taking the order relationship among the annotated data into account, we also propose a novel correlation loss term for model training.
- Score: 3.700565386929641
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The explosion of user-generated videos stimulates a great demand for
no-reference video quality assessment (NR-VQA). Inspired by our observation on
the actions of human annotation, we put forward a Divide and Conquer Video
Quality Estimator (DCVQE) for NR-VQA. Starting from extracting the frame-level
quality embeddings (QE), our proposal splits the whole sequence into a number
of clips and applies Transformers to learn the clip-level QE and update the
frame-level QE simultaneously; another Transformer is introduced to combine the
clip-level QE to generate the video-level QE. We call this hierarchical
combination of Transformers as a Divide and Conquer Transformer (DCTr) layer.
An accurate video quality feature extraction can be achieved by repeating the
process of this DCTr layer several times. Taking the order relationship among
the annotated data into account, we also propose a novel correlation loss term
for model training. Experiments on various datasets confirm the effectiveness
and robustness of our DCVQE model.
Related papers
- CLIPVQA:Video Quality Assessment via CLIP [56.94085651315878]
We propose an efficient CLIP-based Transformer method for the VQA problem ( CLIPVQA)
The proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods.
arXiv Detail & Related papers (2024-07-06T02:32:28Z) - Enhancing Blind Video Quality Assessment with Rich Quality-aware Features [79.18772373737724]
We present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos.
We explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQA models as auxiliary features.
Experimental results demonstrate that the proposed model achieves the best performance on three public social media VQA datasets.
arXiv Detail & Related papers (2024-05-14T16:32:11Z) - Video Quality Assessment Based on Swin TransformerV2 and Coarse to Fine
Strategy [16.436012370209845]
objective of non-reference quality assessment is to evaluate the quality of distorted video without access to high-definition references.
In this study, we introduce an enhanced spatial perception module, pre-trained on multiple image quality assessment datasets, and a lightweight temporal fusion module.
arXiv Detail & Related papers (2024-01-16T17:33:54Z) - CONVIQT: Contrastive Video Quality Estimator [63.749184706461826]
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms.
Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner.
Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
arXiv Detail & Related papers (2022-06-29T15:22:01Z) - DisCoVQA: Temporal Distortion-Content Transformers for Video Quality
Assessment [56.42140467085586]
Some temporal variations are causing temporal distortions and lead to extra quality degradations.
Human visual system often has different attention to frames with different contents.
We propose a novel and effective transformer-based VQA method to tackle these two issues.
arXiv Detail & Related papers (2022-06-20T15:31:27Z) - PeQuENet: Perceptual Quality Enhancement of Compressed Video with
Adaptation- and Attention-based Network [27.375830262287163]
We propose a generative adversarial network (GAN) framework to enhance the perceptual quality of compressed videos.
Our framework includes attention and adaptation to different quantization parameters (QPs) in a single model.
Experimental results demonstrate the superior performance of the proposed PeQuENet compared with the state-of-the-art compressed video quality enhancement algorithms.
arXiv Detail & Related papers (2022-06-16T02:49:28Z) - Video Joint Modelling Based on Hierarchical Transformer for
Co-summarization [0.0]
Video summarization aims to automatically generate a summary (storyboard or video skim) of a video, which can facilitate large-scale video retrieving and browsing.
Most of the existing methods perform video summarization on individual videos, which neglects the correlations among similar videos.
We propose Video Joint Modelling based on Hierarchical Transformer (VJMHT) for co-summarization.
arXiv Detail & Related papers (2021-12-27T01:54:35Z) - Hierarchical Multimodal Transformer to Summarize Videos [103.47766795086206]
Motivated by the great success of transformer and the natural structure of video (frame-shot-video), a hierarchical transformer is developed for video summarization.
To integrate the two kinds of information, they are encoded in a two-stream scheme, and a multimodal fusion mechanism is developed based on the hierarchical transformer.
Practically, extensive experiments show that HMT surpasses most of the traditional, RNN-based and attention-based video summarization methods.
arXiv Detail & Related papers (2021-09-22T07:38:59Z) - DeepQAMVS: Query-Aware Hierarchical Pointer Networks for Multi-Video
Summarization [127.16984421969529]
We introduce a novel Query-Aware Hierarchical Pointer Network for Multi-Video Summarization, termed DeepQAMVS.
DeepQAMVS is trained with reinforcement learning, incorporating rewards that capture representativeness, diversity, query-adaptability and temporal coherence.
We achieve state-of-the-art results on the MVS1K dataset, with inference time scaling linearly with the number of input video frames.
arXiv Detail & Related papers (2021-05-13T17:33:26Z) - Perceptual Image Quality Assessment with Transformers [4.005576542371173]
We propose an image quality transformer (IQT) that successfully applies a transformer architecture to a perceptual full-reference image quality assessment task.
We extract the perceptual feature representations from each of input images using a convolutional neural network backbone.
The proposed IQT was ranked first among 13 participants in the NTIRE 2021 perceptual image quality assessment challenge.
arXiv Detail & Related papers (2021-04-30T02:45:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.