Related papers: Comment on "No-Reference Video Quality Assessment Based on the Temporal Pooling of Deep Features"

Comment on "No-Reference Video Quality Assessment Based on the Temporal Pooling of Deep Features"

URL: http://arxiv.org/abs/2005.04400v1
Date: Sat, 9 May 2020 09:28:01 GMT
Title: Comment on "No-Reference Video Quality Assessment Based on the Temporal Pooling of Deep Features"
Authors: Franz G\"otz-Hahn, Vlad Hosu, Dietmar Saupe
Abstract summary: In Neural Processing Letters 50,3 a machine learning approach to blind video quality assessment was proposed. It is based on temporal pooling of features of video frames, taken from the last pooling layer of deep convolutional neural networks. The method was validated on two established benchmark datasets and gave results far better than the previous state-of-the-art. We show that the originally reported wrong performance results are a consequence of two cases of data leakage.
Score: 6.746400031322727
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In Neural Processing Letters 50,3 (2019) a machine learning approach to blind video quality assessment was proposed. It is based on temporal pooling of features of video frames, taken from the last pooling layer of deep convolutional neural networks. The method was validated on two established benchmark datasets and gave results far better than the previous state-of-the-art. In this letter we report the results from our careful reimplementations. The performance results, claimed in the paper, cannot be reached, and are even below the state-of-the-art by a large margin. We show that the originally reported wrong performance results are a consequence of two cases of data leakage. Information from outside the training dataset was used in the fine-tuning stage and in the model evaluation.

Related papers

PEEL the Layers and Find Yourself: Revisiting Inference-time Data Leakage for Residual Neural Networks [64.90981115460937]
This paper explores inference-time data leakage risks of deep neural networks (NNs) We propose a novel backward feature inversion method, textbfPEEL, which can effectively recover block-wise input features from the intermediate output of residual NNs. Our results show that PEEL outperforms the state-of-the-art recovery methods by an order of magnitude when evaluated by mean squared error (MSE)
arXiv Detail & Related papers (2025-04-08T20:11:05Z)
Role of the Pretraining and the Adaptation data sizes for low-resource real-time MRI video segmentation [26.69134548708678]
Real-time Magnetic Resonance Imaging (rtMRI) is frequently used in speech production studies as it provides a complete view of the vocal tract during articulation. This study investigates the effectiveness of rtMRI in analyzing vocal tract movements by employing the SegNet and UNet models for Air-Tissue Boundary (ATB)segmentation tasks.
arXiv Detail & Related papers (2025-02-20T10:15:43Z)
Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors [54.8852848659663]
Buffer Anytime is a framework for estimation of depth and normal maps (which we call geometric buffers) from video. We demonstrate high-quality video buffer estimation by leveraging single-image priors with temporal consistency constraints.
arXiv Detail & Related papers (2024-11-26T09:28:32Z)
Video Dynamics Prior: An Internal Learning Approach for Robust Video Enhancements [83.5820690348833]
We present a framework for low-level vision tasks that does not require any external training data corpus. Our approach learns neural modules by optimizing over a corrupted sequence, leveraging the weights of the coherence-temporal test and statistics internal statistics.
arXiv Detail & Related papers (2023-12-13T01:57:11Z)
NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition [89.84188594758588]
A novel Non-saliency Suppression Network (NSNet) is proposed to suppress the responses of non-salient frames. NSNet achieves the state-of-the-art accuracy-efficiency trade-off and presents a significantly faster (2.44.3x) practical inference speed than state-of-the-art methods.
arXiv Detail & Related papers (2022-07-21T09:41:22Z)
CONVIQT: Contrastive Video Quality Estimator [63.749184706461826]
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms. Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner. Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
arXiv Detail & Related papers (2022-06-29T15:22:01Z)
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video. Recent studies have found that current benchmark datasets may have obvious moment annotation biases. We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z)
Unsupervised Video Summarization via Multi-source Features [4.387757291346397]
Video summarization aims at generating a compact yet representative visual summary that conveys the essence of the original video. We propose the incorporation of multiple feature sources with chunk and stride fusion to provide more information about the visual content. For a comprehensive evaluation on the two benchmarks TVSum and SumMe, we compare our method with four state-of-the-art approaches.
arXiv Detail & Related papers (2021-05-26T13:12:46Z)
Weakly Supervised Video Salient Object Detection [79.51227350937721]
We present the first weakly supervised video salient object detection model based on relabeled "fixation guided scribble annotations" An "Appearance-motion fusion module" and bidirectional ConvLSTM based framework are proposed to achieve effective multi-modal learning and long-term temporal context modeling.
arXiv Detail & Related papers (2021-04-06T09:48:38Z)
W2WNet: a two-module probabilistic Convolutional Neural Network with embedded data cleansing functionality [2.695466667982714]
Wise2WipedNet (W2WNet) is a new two- module Convolutional Neural Network. A Wise module exploits Bayesian inference to identify and discard spurious images during the training. A Wiped module takes care of the final classification while broadcasting information on the prediction confidence at inference time.
arXiv Detail & Related papers (2021-03-24T11:28:59Z)
Improving Action Quality Assessment using ResNets and Weighted Aggregation [0.0]
Action quality assessment (AQA) aims at automatically judging human action based on a video of the said action and assigning a performance score to it. The majority of works in the existing literature on AQA transform RGB videos to higher-level representations using C3D networks. Due to the relatively shallow nature of C3D, the quality of extracted features is lower than what could be extracted using a deeper convolutional neural network.
arXiv Detail & Related papers (2021-02-21T08:36:22Z)
Critical analysis on the reproducibility of visual quality assessment using deep features [6.746400031322727]
Data used to train supervised machine learning models are commonly split into independent training, validation, and test sets. This paper illustrates that complex data leakage cases have occurred in the no-reference image and video quality assessment literature.
arXiv Detail & Related papers (2020-09-10T09:51:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.