Related papers: Highly Efficient No-reference 4K Video Quality Assessment with Full-Pixel Covering Sampling and Training Strategy

Highly Efficient No-reference 4K Video Quality Assessment with Full-Pixel Covering Sampling and Training Strategy

URL: http://arxiv.org/abs/2407.20766v1
Date: Tue, 30 Jul 2024 12:10:33 GMT
Title: Highly Efficient No-reference 4K Video Quality Assessment with Full-Pixel Covering Sampling and Training Strategy
Authors: Xiaoheng Tan, Jiabin Zhang, Yuhui Quan, Jing Li, Yajing Wu, Zilin Bian,
Abstract summary: No-reference (NR) VQA methods play a vital role in situations where obtaining reference videos is restricted or not feasible. As more streaming videos are being created in ultra-high definition (e.g., 4K) to enrich viewers' experiences, the current deep VQA methods face unacceptable computational costs. In this paper, we propose a highly efficient and novel NR 4K VQA technology.
Score: 23.61467796740852
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Deep Video Quality Assessment (VQA) methods have shown impressive high-performance capabilities. Notably, no-reference (NR) VQA methods play a vital role in situations where obtaining reference videos is restricted or not feasible. Nevertheless, as more streaming videos are being created in ultra-high definition (e.g., 4K) to enrich viewers' experiences, the current deep VQA methods face unacceptable computational costs. Furthermore, the resizing, cropping, and local sampling techniques employed in these methods can compromise the details and content of original 4K videos, thereby negatively impacting quality assessment. In this paper, we propose a highly efficient and novel NR 4K VQA technology. Specifically, first, a novel data sampling and training strategy is proposed to tackle the problem of excessive resolution. This strategy allows the VQA Swin Transformer-based model to effectively train and make inferences using the full data of 4K videos on standard consumer-grade GPUs without compromising content or details. Second, a weighting and scoring scheme is developed to mimic the human subjective perception mode, which is achieved by considering the distinct impact of each sub-region within a 4K frame on the overall perception. Third, we incorporate the frequency domain information of video frames to better capture the details that affect video quality, consequently further improving the model's generalizability. To our knowledge, this is the first technology for the NR 4K VQA task. Thorough empirical studies demonstrate it not only significantly outperforms existing methods on a specialized 4K VQA dataset but also achieves state-of-the-art performance across multiple open-source NR video quality datasets.

Related papers

SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning [78.44705665291741]
We present a comprehensive evaluation of modern video self-supervised models. We focus on generalization across four key downstream factors: domain shift, sample efficiency, action granularity, and task diversity. Our analysis shows that, despite architectural advances, transformer-based models remain sensitive to downstream conditions.
arXiv Detail & Related papers (2025-04-08T06:00:28Z)
Elevating Flow-Guided Video Inpainting with Reference Generation [50.03502211226332]
Video inpainting (VI) is a challenging task that requires effective propagation of observable content across frames while simultaneously generating new content not present in the original video. We propose a robust and practical VI framework that leverages a large generative model for reference generation in combination with an advanced pixel propagation algorithm. Our method not only significantly enhances frame-level quality for object removal but also synthesizes new content in the missing areas based on user-provided text prompts.
arXiv Detail & Related papers (2024-12-12T06:13:00Z)
EVQAScore: Efficient Video Question Answering Data Evaluation [23.812020049901452]
We introduce EVQAScore, a reference-free method that leverages keyword extraction to assess both video caption and video QA data quality. Our approach achieves state-of-the-art (SOTA) performance (32.8 for Kendall correlation and 42.3 for Spearman correlation, 4.7 and 5.9 higher than the previous method PAC-S++, for video caption evaluation) By using EVQAScore for data selection, we achieved SOTA results with only 12.5% of the original data volume, outperforming the previous SOTA method PAC-S and 100% of data.
arXiv Detail & Related papers (2024-11-11T12:11:36Z)
VQA$^2$: Visual Question Answering for Video Quality Assessment [76.81110038738699]
Video Quality Assessment (VQA) is a classic field in low-level visual perception. Recent studies in the image domain have demonstrated that Visual Question Answering (VQA) can enhance markedly low-level visual quality evaluation. We introduce the VQA2 Instruction dataset - the first visual question answering instruction dataset that focuses on video quality assessment. The VQA2 series models interleave visual and motion tokens to enhance the perception of spatial-temporal quality details in videos.
arXiv Detail & Related papers (2024-11-06T09:39:52Z)
AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results [120.95863275142727]
This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dataset of 459 videos encoded with 14 codecs of various compression standards.
arXiv Detail & Related papers (2024-08-21T20:32:45Z)
CLIPVQA:Video Quality Assessment via CLIP [56.94085651315878]
We propose an efficient CLIP-based Transformer method for the VQA problem ( CLIPVQA) The proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods.
arXiv Detail & Related papers (2024-07-06T02:32:28Z)
MRET: Multi-resolution Transformer for Video Quality Assessment [37.355412115794195]
No-reference video quality assessment (NR-VQA) for user generated content (UGC) is crucial for understanding and improving visual experience. Since large amounts of videos nowadays are 720p or above, the fixed and relatively small input used in conventional NR-VQA methods results in missing high-frequency details for many videos. We propose a novel Transformer-based NR-VQA framework that preserves the high-resolution quality information.
arXiv Detail & Related papers (2023-03-13T21:48:49Z)
Neighbourhood Representative Sampling for Efficient End-to-end Video Quality Assessment [60.57703721744873]
The increased resolution of real-world videos presents a dilemma between efficiency and accuracy for deep Video Quality Assessment (VQA) In this work, we propose a unified scheme, spatial-temporal grid mini-cube sampling (St-GMS) to get a novel type of sample, named fragments. With fragments and FANet, the proposed efficient end-to-end FAST-VQA and FasterVQA achieve significantly better performance than existing approaches on all VQA benchmarks.
arXiv Detail & Related papers (2022-10-11T11:38:07Z)
FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment Sampling [54.31355080688127]
Current deep video quality assessment (VQA) methods are usually with high computational costs when evaluating high-resolution videos. We propose Grid Mini-patch Sampling (GMS), which allows consideration of local quality by sampling patches at their raw resolution. We build the Fragment Attention Network (FANet) specially designed to accommodate fragments as inputs. FAST-VQA improves state-of-the-art accuracy by around 10% while reducing 99.5% FLOPs on 1080P high-resolution videos.
arXiv Detail & Related papers (2022-07-06T11:11:43Z)
Deep Neural Network for Blind Visual Quality Assessment of 4K Content [37.70643043547502]
Existing blind image quality assessment (BIQA) methods are not suitable for the original and upscaled 4K contents. We propose a deep learning-based BIQA model for 4K content, which on one hand can recognize true and pseudo 4K content and on the other hand can evaluate their perceptual visual quality. The proposed model is trained through the multi-task learning manner and we introduce an uncertainty principle to balance the losses of the classification and regression tasks.
arXiv Detail & Related papers (2022-06-09T09:10:54Z)
Patch-VQ: 'Patching Up' the Video Quality Problem [0.9786690381850356]
No-reference (NR) perceptual video quality assessment (VQA) is a complex, unsolved, and important problem to social and streaming media applications. Current NR models are limited in their prediction capabilities on real-world, "in-the-wild" video data. We create the largest (by far) subjective video quality dataset, containing 39, 000 realworld distorted videos and 117, 000 space-time localized video patches.
arXiv Detail & Related papers (2020-11-27T03:46:44Z)
Video Compression with CNN-based Post Processing [18.145942926665164]
We propose a new CNN-based post-processing approach, which has been integrated with two state-of-the-art coding standards, VVC and AV1. Results show consistent coding gains on all tested sequences at various spatial resolutions, with average bit rate savings of 4.0% and 5.8% against original VVC and AV1 respectively.
arXiv Detail & Related papers (2020-09-16T10:07:32Z)
UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content [59.13821614689478]
Blind quality prediction of in-the-wild videos is quite challenging, since the quality degradations of content are unpredictable, complicated, and often commingled. Here we contribute to advancing the problem by conducting a comprehensive evaluation of leading VQA models. By employing a feature selection strategy on top of leading VQA model features, we are able to extract 60 of the 763 statistical features used by the leading models. Our experimental results show that VIDEVAL achieves state-of-theart performance at considerably lower computational cost than other leading models.
arXiv Detail & Related papers (2020-05-29T00:39:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.