CGVQM+D: Computer Graphics Video Quality Metric and Dataset
- URL: http://arxiv.org/abs/2506.11546v1
- Date: Fri, 13 Jun 2025 07:59:55 GMT
- Title: CGVQM+D: Computer Graphics Video Quality Metric and Dataset
- Authors: Akshay Jindal, Nabil Sadaka, Manu Mathew Thomas, Anton Sochenov, Anton Kaplanyan,
- Abstract summary: We present a novel video quality dataset focused on distortions introduced by advanced rendering techniques.<n>Our evaluations show that existing full-reference quality metrics perform sub-optimally on these distortions.<n>We propose CGVQM, a full-reference video quality metric that significantly outperforms existing metrics.
- Score: 4.5660125761127235
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While existing video and image quality datasets have extensively studied natural videos and traditional distortions, the perception of synthetic content and modern rendering artifacts remains underexplored. We present a novel video quality dataset focused on distortions introduced by advanced rendering techniques, including neural supersampling, novel-view synthesis, path tracing, neural denoising, frame interpolation, and variable rate shading. Our evaluations show that existing full-reference quality metrics perform sub-optimally on these distortions, with a maximum Pearson correlation of 0.78. Additionally, we find that the feature space of pre-trained 3D CNNs aligns strongly with human perception of visual quality. We propose CGVQM, a full-reference video quality metric that significantly outperforms existing metrics while generating both per-pixel error maps and global quality scores. Our dataset and metric implementation is available at https://github.com/IntelLabs/CGVQM.
Related papers
- BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos [63.03271511550633]
BrokenVideos is a benchmark dataset of 3,254 AI-generated videos with meticulously annotated, pixel-level masks highlighting regions of visual corruption.<n>Our experiments show that training state of the art artifact detection models and multi modal large language models (MLLMs) on BrokenVideos significantly improves their ability to localize corrupted regions.
arXiv Detail & Related papers (2025-06-25T03:30:04Z) - Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content [35.02160595617654]
We introduce Koala-36M, a large-scale, high-quality video dataset featuring accurate temporal splitting, detailed captions, and superior video quality.<n>We employ a linear classifier on probability distributions to enhance the accuracy of transition detection, ensuring better temporal consistency.<n>We develop a Video Training Suitability Score (VTSS) that integrates multiple sub-metrics, allowing us to filter high-quality videos from the original corpus.
arXiv Detail & Related papers (2024-10-10T17:57:49Z) - BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement [56.97766265018334]
This paper introduces a low-light video dataset, consisting of 40 scenes with various motion scenarios under two distinct low-lighting conditions.
We provide fully registered ground truth data captured in normal light using a programmable motorized dolly and refine it via an image-based approach for pixel-wise frame alignment across different light levels.
Our experimental results demonstrate the significance of fully registered video pairs for low-light video enhancement (LLVE) and the comprehensive evaluation shows that the models trained with our dataset outperform those trained with the existing datasets.
arXiv Detail & Related papers (2024-07-03T22:41:49Z) - D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video [53.83936023443193]
This paper contributes to the field by introducing a new synthesis method for dynamic novel view from monocular video, such as smartphone captures.<n>Our approach represents the as a $textitdynamic neural point cloud$, an implicit time-conditioned point cloud that encodes local geometry and appearance in separate hash-encoded neural feature grids.
arXiv Detail & Related papers (2024-06-14T14:35:44Z) - Expanding Synthetic Real-World Degradations for Blind Video Super
Resolution [3.474523163017713]
Video super-resolution (VSR) techniques have drastically improved over the last few years and shown impressive performance on synthetic data.
However, their performance on real-world video data suffers because of the complexity of real-world degradations and misaligned video frames.
In this paper, we propose real-world degradations on synthetic training datasets.
arXiv Detail & Related papers (2023-05-04T08:58:31Z) - Learning from Mixed Datasets: A Monotonic Image Quality Assessment Model [17.19991754976893]
We propose a monotonic neural network for IQA model learning with different datasets combined.
In particular, our model consists of a dataset-shared quality regressor and several dataset-specific quality transformers.
arXiv Detail & Related papers (2022-09-21T15:53:59Z) - Exploring the Effectiveness of Video Perceptual Representation in Blind
Video Quality Assessment [55.65173181828863]
We propose a temporal perceptual quality index (TPQI) to measure the temporal distortion by describing the graphic morphology of the representation.
Experiments show that TPQI is an effective way of predicting subjective temporal quality.
arXiv Detail & Related papers (2022-07-08T07:30:51Z) - Image Quality Assessment using Contrastive Learning [50.265638572116984]
We train a deep Convolutional Neural Network (CNN) using a contrastive pairwise objective to solve the auxiliary problem.
We show through extensive experiments that CONTRIQUE achieves competitive performance when compared to state-of-the-art NR image quality models.
Our results suggest that powerful quality representations with perceptual relevance can be obtained without requiring large labeled subjective image quality datasets.
arXiv Detail & Related papers (2021-10-25T21:01:00Z) - Unified Quality Assessment of In-the-Wild Videos with Mixed Datasets
Training [20.288424566444224]
We focus on automatically assessing the quality of in-the-wild videos in computer vision applications.
To improve the performance of quality assessment models, we borrow intuitions from human perception.
We propose a mixed datasets training strategy for training a single VQA model with multiple datasets.
arXiv Detail & Related papers (2020-11-09T09:22:57Z) - Coherent Loss: A Generic Framework for Stable Video Segmentation [103.78087255807482]
We investigate how a jittering artifact degrades the visual quality of video segmentation results.
We propose a Coherent Loss with a generic framework to enhance the performance of a neural network against jittering artifacts.
arXiv Detail & Related papers (2020-10-25T10:48:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.