CompressedVQA-HDR: Generalized Full-reference and No-reference Quality Assessment Models for Compressed High Dynamic Range Videos
- URL: http://arxiv.org/abs/2507.11900v1
- Date: Wed, 16 Jul 2025 04:33:06 GMT
- Title: CompressedVQA-HDR: Generalized Full-reference and No-reference Quality Assessment Models for Compressed High Dynamic Range Videos
- Authors: Wei Sun, Linhan Cao, Kang Fu, Dandan Zhu, Jun Jia, Menghan Hu, Xiongkuo Min, Guangtao Zhai,
- Abstract summary: We introduce CompressedVQA-SDR, an effective VQA framework designed to address the challenges of HDR video quality assessment.<n>We adopt the Swin Transformer and SigLip 2 as the backbone networks for the proposed full-reference (FR) and no-reference (NR) VQA models, respectively.<n>Our models achieve state-of-the-art performance compared to existing FR and NR VQA models.
- Score: 46.255654141741815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video compression is a standard procedure applied to all videos to minimize storage and transmission demands while preserving visual quality as much as possible. Therefore, evaluating the visual quality of compressed videos is crucial for guiding the practical usage and further development of video compression algorithms. Although numerous compressed video quality assessment (VQA) methods have been proposed, they often lack the generalization capability needed to handle the increasing diversity of video types, particularly high dynamic range (HDR) content. In this paper, we introduce CompressedVQA-HDR, an effective VQA framework designed to address the challenges of HDR video quality assessment. Specifically, we adopt the Swin Transformer and SigLip 2 as the backbone networks for the proposed full-reference (FR) and no-reference (NR) VQA models, respectively. For the FR model, we compute deep structural and textural similarities between reference and distorted frames using intermediate-layer features extracted from the Swin Transformer as its quality-aware feature representation. For the NR model, we extract the global mean of the final-layer feature maps from SigLip 2 as its quality-aware representation. To mitigate the issue of limited HDR training data, we pre-train the FR model on a large-scale standard dynamic range (SDR) VQA dataset and fine-tune it on the HDRSDR-VQA dataset. For the NR model, we employ an iterative mixed-dataset training strategy across multiple compressed VQA datasets, followed by fine-tuning on the HDRSDR-VQA dataset. Experimental results show that our models achieve state-of-the-art performance compared to existing FR and NR VQA models. Moreover, CompressedVQA-HDR-FR won first place in the FR track of the Generalizable HDR & SDR Video Quality Measurement Grand Challenge at IEEE ICME 2025. The code is available at https://github.com/sunwei925/CompressedVQA-HDR.
Related papers
- ICME 2025 Generalizable HDR and SDR Video Quality Measurement Grand Challenge [66.86693390673298]
The challenge was established to benchmark and promote VQA approaches capable of jointly handling HDR and SDR content.<n>The top-performing model achieved state-of-the-art performance, setting a new benchmark for generalizable video quality assessment.
arXiv Detail & Related papers (2025-06-28T07:14:23Z) - CANeRV: Content Adaptive Neural Representation for Video Compression [89.35616046528624]
We propose Content Adaptive Neural Representation for Video Compression (CANeRV)<n>CANeRV is an innovative INR-based video compression network that adaptively conducts structure optimisation based on the specific content of each video sequence.<n>We show that CANeRV can outperform both H.266/VVC and state-of-the-art INR-based video compression techniques across diverse video datasets.
arXiv Detail & Related papers (2025-02-10T06:21:16Z) - A FUNQUE Approach to the Quality Assessment of Compressed HDR Videos [36.26141980831573]
State-of-the-art (SOTA) approach HDRMAX involves augmenting off-the-shelf video quality models, such as VMAF, with features computed on non-linearly transformed video frames.
Here, we show that an efficient class of video quality prediction models named FUNQUE+ achieves higher HDR video quality prediction accuracy at lower computational cost.
arXiv Detail & Related papers (2023-12-13T21:24:00Z) - HIDRO-VQA: High Dynamic Range Oracle for Video Quality Assessment [36.1179702443845]
We introduce HIDRO-VQA, a no-reference (NR) video quality assessment model designed to provide precise quality evaluations of High Dynamic Range () videos.
Our findings demonstrate that self-supervised pre-trained neural networks can be further fine-tuned in a self-supervised setting to achieve state-of-the-art performance.
Our algorithm can be extended to the Full Reference VQA setting, also achieving state-of-the-art performance.
arXiv Detail & Related papers (2023-11-18T12:33:19Z) - HDR-ChipQA: No-Reference Quality Assessment on High Dynamic Range Videos [38.504568225201915]
We present a no-reference video quality model and algorithm that delivers standout performance for High Dynamic Range (Chip) videos.
HDR videos represent wider ranges of luminances, details, and colors than Standard Dynamic Range (SDR) videos.
arXiv Detail & Related papers (2023-04-25T21:25:02Z) - Making Video Quality Assessment Models Robust to Bit Depth [38.504568225201915]
We introduce a novel feature set, which we call HDRMAX features, that when included into Video Quality Assessment (VQA) algorithms, sensitizes them to distortions of High Dynamic Range (SDR) videos.
While these features are not specific to HDR, and also augment the equality prediction performances of VQA models on SDR content, they are especially effective on HDR.
As a demonstration of the efficacy of our approach, we show that, while current state-of-the-art VQA models perform poorly on 10-bit HDR databases, their performances are greatly improved by the inclusion of HDRMAX features when tested on
arXiv Detail & Related papers (2023-04-25T18:54:28Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z) - FAVER: Blind Quality Prediction of Variable Frame Rate Videos [47.951054608064126]
Video quality assessment (VQA) remains an important and challenging problem that affects many applications at the widest scales.
We propose a first-of-a-kind blind VQA model for evaluating HFR videos, which we dub the Framerate-Aware Video Evaluator w/o Reference (FAVER)
Our experiments on several HFR video quality datasets show that FAVER outperforms other blind VQA algorithms at a reasonable computational cost.
arXiv Detail & Related papers (2022-01-05T07:54:12Z) - High Frame Rate Video Quality Assessment using VMAF and Entropic
Differences [50.265638572116984]
The popularity of streaming videos with live, high-action content has led to an increased interest in High Frame Rate (HFR) videos.
In this work we address the problem of frame rate dependent Video Quality Assessment (VQA) when the videos to be compared have different frame rate and compression factor.
We show through various experiments that the proposed fusion framework results in more efficient features for predicting frame rate dependent video quality.
arXiv Detail & Related papers (2021-09-27T04:08:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.