ChipQA: No-Reference Video Quality Prediction via Space-Time Chips
- URL: http://arxiv.org/abs/2109.08726v1
- Date: Fri, 17 Sep 2021 19:16:31 GMT
- Title: ChipQA: No-Reference Video Quality Prediction via Space-Time Chips
- Authors: Joshua P. Ebenezer, Zaixi Shang, Yongjun Wu, Hai Wei, Sriram
Sethuraman, Alan C. Bovik
- Abstract summary: We propose a new model for no-reference video quality assessment (VQA)
Our approach uses a new idea of highly-localized space-time slices called Space-Time Chips (ST Chips)
We show that our model achieves state-of-the-art performance at reduced cost, without requiring motion computation.
- Score: 33.12375264668551
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a new model for no-reference video quality assessment (VQA). Our
approach uses a new idea of highly-localized space-time (ST) slices called
Space-Time Chips (ST Chips). ST Chips are localized cuts of video data along
directions that \textit{implicitly} capture motion. We use
perceptually-motivated bandpass and normalization models to first process the
video data, and then select oriented ST Chips based on how closely they fit
parametric models of natural video statistics. We show that the parameters that
describe these statistics can be used to reliably predict the quality of
videos, without the need for a reference video. The proposed method implicitly
models ST video naturalness, and deviations from naturalness. We train and test
our model on several large VQA databases, and show that our model achieves
state-of-the-art performance at reduced cost, without requiring motion
computation.
Related papers
- Video Occupancy Models [59.17330408925321]
Video Occupancy models (VOCs) operate in a compact latent space.
Unlike prior latent-space world models, VOCs directly predict the discounted distribution of future states in a single step.
arXiv Detail & Related papers (2024-06-25T17:57:38Z) - ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation [81.90265212988844]
We propose a training-free video method for generative video models in a plug-and-play manner.
We transform a video model into a self-cascaded video diffusion model with the designed hidden state correction modules.
Our training-free method is even comparable to trained models supported by huge compute resources and large-scale datasets.
arXiv Detail & Related papers (2024-06-03T00:31:13Z) - PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild [27.195339506769457]
Video quality assessment (VQA) is a challenging problem due to the numerous factors that can affect the perceptual quality of a video.
Annotating the Mean opinion score (MOS) for videos is expensive and time-consuming, which limits the scale of VQA datasets.
We propose a VQA method named PTM-VQA, which leverages PreTrained Models to transfer knowledge from models pretrained on various pre-tasks.
arXiv Detail & Related papers (2024-05-28T02:37:29Z) - TAPIR: Tracking Any Point with per-frame Initialization and temporal
Refinement [64.11385310305612]
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence.
Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations.
The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS.
arXiv Detail & Related papers (2023-06-14T17:07:51Z) - LSTM-based Video Quality Prediction Accounting for Temporal Distortions
in Videoconferencing Calls [22.579711841384764]
We present a data-driven approach for modeling such distortions automatically by training an LSTM with subjective quality ratings labeled via crowdsourcing.
We applied QR codes as markers on the source videos to create aligned references and compute temporal features based on the alignment vectors.
Our proposed model achieves a PCC of 0.99 on the validation set and gives detailed insight into the cause of video quality impairments.
arXiv Detail & Related papers (2023-03-22T17:14:38Z) - Semi-Parametric Video-Grounded Text Generation [21.506377836451577]
In this paper, we propose a semi-parametric video-grounded text generation model, SeViT.
Treating a video as an external data store, SeViT includes a non-parametric frame retriever to select a few query-relevant frames.
Experimental results demonstrate our method has a significant advantage in longer videos and causal video understanding.
arXiv Detail & Related papers (2023-01-27T03:00:43Z) - Video Demoireing with Relation-Based Temporal Consistency [68.20281109859998]
Moire patterns, appearing as color distortions, severely degrade image and video qualities when filming a screen with digital cameras.
We study how to remove such undesirable moire patterns in videos, namely video demoireing.
arXiv Detail & Related papers (2022-04-06T17:45:38Z) - FOVQA: Blind Foveated Video Quality Assessment [1.4127304025810108]
We develop a no-reference (NR) foveated video quality assessment model, called FOVQA.
It is based on new models of space-variant natural scene statistics (NSS) and natural video statistics (NVS)
FOVQA achieves state-of-the-art (SOTA) performance on the new 2D LIVE-FBT-FCVR database.
arXiv Detail & Related papers (2021-06-24T21:38:22Z) - VideoGPT: Video Generation using VQ-VAE and Transformers [75.20543171520565]
VideoGG is a conceptually simple architecture for scaling likelihood based generative modeling to natural videos.
VideoG uses VQ-E that learns downsampled discrete latent representations by employing 3D convolutions and axial self-attention.
Our architecture is able to generate samples competitive with state-of-the-art GAN models for video generation on the B-101 Robot dataset.
arXiv Detail & Related papers (2021-04-20T17:58:03Z) - ViViT: A Video Vision Transformer [75.74690759089529]
We present pure-transformer based models for video classification.
Our model extracts-temporal tokens from the input video, which are then encoded by a series of transformer layers.
We show how we can effectively regularise the model during training and leverage pretrained image models to be able to train on comparatively small datasets.
arXiv Detail & Related papers (2021-03-29T15:27:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.