Comment on "No-Reference Video Quality Assessment Based on the Temporal
Pooling of Deep Features"
- URL: http://arxiv.org/abs/2005.04400v1
- Date: Sat, 9 May 2020 09:28:01 GMT
- Title: Comment on "No-Reference Video Quality Assessment Based on the Temporal
Pooling of Deep Features"
- Authors: Franz G\"otz-Hahn, Vlad Hosu, Dietmar Saupe
- Abstract summary: In Neural Processing Letters 50,3 a machine learning approach to blind video quality assessment was proposed.
It is based on temporal pooling of features of video frames, taken from the last pooling layer of deep convolutional neural networks.
The method was validated on two established benchmark datasets and gave results far better than the previous state-of-the-art.
We show that the originally reported wrong performance results are a consequence of two cases of data leakage.
- Score: 6.746400031322727
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Neural Processing Letters 50,3 (2019) a machine learning approach to blind
video quality assessment was proposed. It is based on temporal pooling of
features of video frames, taken from the last pooling layer of deep
convolutional neural networks. The method was validated on two established
benchmark datasets and gave results far better than the previous
state-of-the-art. In this letter we report the results from our careful
reimplementations. The performance results, claimed in the paper, cannot be
reached, and are even below the state-of-the-art by a large margin. We show
that the originally reported wrong performance results are a consequence of two
cases of data leakage. Information from outside the training dataset was used
in the fine-tuning stage and in the model evaluation.
Related papers
- Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors [54.8852848659663]
Buffer Anytime is a framework for estimation of depth and normal maps (which we call geometric buffers) from video.
We demonstrate high-quality video buffer estimation by leveraging single-image priors with temporal consistency constraints.
arXiv Detail & Related papers (2024-11-26T09:28:32Z) - Video Dynamics Prior: An Internal Learning Approach for Robust Video
Enhancements [83.5820690348833]
We present a framework for low-level vision tasks that does not require any external training data corpus.
Our approach learns neural modules by optimizing over a corrupted sequence, leveraging the weights of the coherence-temporal test and statistics internal statistics.
arXiv Detail & Related papers (2023-12-13T01:57:11Z) - NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition [89.84188594758588]
A novel Non-saliency Suppression Network (NSNet) is proposed to suppress the responses of non-salient frames.
NSNet achieves the state-of-the-art accuracy-efficiency trade-off and presents a significantly faster (2.44.3x) practical inference speed than state-of-the-art methods.
arXiv Detail & Related papers (2022-07-21T09:41:22Z) - CONVIQT: Contrastive Video Quality Estimator [63.749184706461826]
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms.
Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner.
Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
arXiv Detail & Related papers (2022-06-29T15:22:01Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Unsupervised Video Summarization via Multi-source Features [4.387757291346397]
Video summarization aims at generating a compact yet representative visual summary that conveys the essence of the original video.
We propose the incorporation of multiple feature sources with chunk and stride fusion to provide more information about the visual content.
For a comprehensive evaluation on the two benchmarks TVSum and SumMe, we compare our method with four state-of-the-art approaches.
arXiv Detail & Related papers (2021-05-26T13:12:46Z) - Weakly Supervised Video Salient Object Detection [79.51227350937721]
We present the first weakly supervised video salient object detection model based on relabeled "fixation guided scribble annotations"
An "Appearance-motion fusion module" and bidirectional ConvLSTM based framework are proposed to achieve effective multi-modal learning and long-term temporal context modeling.
arXiv Detail & Related papers (2021-04-06T09:48:38Z) - W2WNet: a two-module probabilistic Convolutional Neural Network with
embedded data cleansing functionality [2.695466667982714]
Wise2WipedNet (W2WNet) is a new two- module Convolutional Neural Network.
A Wise module exploits Bayesian inference to identify and discard spurious images during the training.
A Wiped module takes care of the final classification while broadcasting information on the prediction confidence at inference time.
arXiv Detail & Related papers (2021-03-24T11:28:59Z) - Improving Action Quality Assessment using ResNets and Weighted
Aggregation [0.0]
Action quality assessment (AQA) aims at automatically judging human action based on a video of the said action and assigning a performance score to it.
The majority of works in the existing literature on AQA transform RGB videos to higher-level representations using C3D networks.
Due to the relatively shallow nature of C3D, the quality of extracted features is lower than what could be extracted using a deeper convolutional neural network.
arXiv Detail & Related papers (2021-02-21T08:36:22Z) - Critical analysis on the reproducibility of visual quality assessment
using deep features [6.746400031322727]
Data used to train supervised machine learning models are commonly split into independent training, validation, and test sets.
This paper illustrates that complex data leakage cases have occurred in the no-reference image and video quality assessment literature.
arXiv Detail & Related papers (2020-09-10T09:51:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.