ColorVideoVDP: A visual difference predictor for image, video and display distortions
- URL: http://arxiv.org/abs/2401.11485v2
- Date: Tue, 2 Jul 2024 21:16:38 GMT
- Title: ColorVideoVDP: A visual difference predictor for image, video and display distortions
- Authors: Rafal K. Mantiuk, Param Hanji, Maliha Ashraf, Yuta Asano, Alexandre Chapiro,
- Abstract summary: metric is built on novel psychophysical models of chromatic contrast sensitivity and cross-channel contrast masking.
It accounts for the viewing conditions, geometric, and photometric characteristics of the display.
It was trained to predict common video streaming distortions and 8 new distortion types related to AR/VR displays.
- Score: 51.29162719944865
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: ColorVideoVDP is a video and image quality metric that models spatial and temporal aspects of vision, for both luminance and color. The metric is built on novel psychophysical models of chromatic spatiotemporal contrast sensitivity and cross-channel contrast masking. It accounts for the viewing conditions, geometric, and photometric characteristics of the display. It was trained to predict common video streaming distortions (e.g. video compression, rescaling, and transmission errors), and also 8 new distortion types related to AR/VR displays (e.g. light source and waveguide non-uniformities). To address the latter application, we collected our novel XR-Display-Artifact-Video quality dataset (XR-DAVID), comprised of 336 distorted videos. Extensive testing on XR-DAVID, as well as several datasets from the literature, indicate a significant gain in prediction performance compared to existing metrics. ColorVideoVDP opens the doors to many novel applications which require the joint automated spatiotemporal assessment of luminance and color distortions, including video streaming, display specification and design, visual comparison of results, and perceptually-guided quality optimization.
Related papers
- Deep chroma compression of tone-mapped images [46.07829363710451]
We propose a generative adversarial network for fast and reliable chroma compression of HDR tone-mapped images.
We show that the proposed model outperforms state-of-the-art image generation and enhancement networks in color accuracy.
The model achieves real-time performance, showing promising results for deployment on devices with limited computational resources.
arXiv Detail & Related papers (2024-09-24T12:31:55Z) - BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement [56.97766265018334]
This paper introduces a low-light video dataset, consisting of 40 scenes with various motion scenarios under two distinct low-lighting conditions.
We provide fully registered ground truth data captured in normal light using a programmable motorized dolly and refine it via an image-based approach for pixel-wise frame alignment across different light levels.
Our experimental results demonstrate the significance of fully registered video pairs for low-light video enhancement (LLVE) and the comprehensive evaluation shows that the models trained with our dataset outperform those trained with the existing datasets.
arXiv Detail & Related papers (2024-07-03T22:41:49Z) - Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution [151.1255837803585]
We propose a novel approach, pursuing Spatial Adaptation and Temporal Coherence (SATeCo) for video super-resolution.
SATeCo pivots on learning spatial-temporal guidance from low-resolution videos to calibrate both latent-space high-resolution video denoising and pixel-space video reconstruction.
Experiments conducted on the REDS4 and Vid4 datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-03-25T17:59:26Z) - SLIC: A Learned Image Codec Using Structure and Color [0.41232474244672235]
We propose a structure and color based encoder (SLIC) in which the task of compression is split into that of luminance and chrominance.
The deep learning model is built with a novel multi-scale architecture for Y and UV channels.
Various experiments are carried out to study and analyze the performance of the proposed model.
arXiv Detail & Related papers (2024-01-30T18:39:54Z) - Video Colorization with Pre-trained Text-to-Image Diffusion Models [19.807766482434563]
We present ColorDiffuser, an adaptation of a pre-trained text-to-image latent diffusion model for video colorization.
We propose two novel techniques to enhance the temporal coherence and maintain the vividness of colorization across frames.
arXiv Detail & Related papers (2023-06-02T17:58:00Z) - HDR-VDP-3: A multi-metric for predicting image differences, quality and
contrast distortions in high dynamic range and regular content [14.75838951347139]
High-Dynamic-Range Visual-Difference-Predictor version 3, or HDR-VDP-3, is a visual metric that can fulfill several tasks.
Here we present a high-level overview of the metric, position it with respect to related work, explain the main differences compared to version 2.2, and describe how the metric was adapted for the HDR Video Quality Measurement Grand Challenge 2023.
arXiv Detail & Related papers (2023-04-26T15:32:04Z) - Saliency-Aware Spatio-Temporal Artifact Detection for Compressed Video
Quality Assessment [16.49357671290058]
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Temporal Artifacts (PEAs)
In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality.
Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed.
arXiv Detail & Related papers (2023-01-03T12:48:27Z) - VIDM: Video Implicit Diffusion Models [75.90225524502759]
Diffusion models have emerged as a powerful generative method for synthesizing high-quality and diverse set of images.
We propose a video generation method based on diffusion models, where the effects of motion are modeled in an implicit condition.
We improve the quality of the generated videos by proposing multiple strategies such as sampling space truncation, robustness penalty, and positional group normalization.
arXiv Detail & Related papers (2022-12-01T02:58:46Z) - Exploring the Effectiveness of Video Perceptual Representation in Blind
Video Quality Assessment [55.65173181828863]
We propose a temporal perceptual quality index (TPQI) to measure the temporal distortion by describing the graphic morphology of the representation.
Experiments show that TPQI is an effective way of predicting subjective temporal quality.
arXiv Detail & Related papers (2022-07-08T07:30:51Z) - BBAND Index: A No-Reference Banding Artifact Predictor [55.42929350861115]
Banding artifact, or false contouring, is a common video compression impairment.
We propose a new distortion-specific no-reference video quality model for predicting banding artifacts, called the Blind BANding Detector (BBAND index)
arXiv Detail & Related papers (2020-02-27T03:05:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.