Critical analysis on the reproducibility of visual quality assessment
using deep features
- URL: http://arxiv.org/abs/2009.05369v3
- Date: Mon, 1 Mar 2021 10:59:22 GMT
- Title: Critical analysis on the reproducibility of visual quality assessment
using deep features
- Authors: Franz G\"otz-Hahn and Vlad Hosu and Dietmar Saupe
- Abstract summary: Data used to train supervised machine learning models are commonly split into independent training, validation, and test sets.
This paper illustrates that complex data leakage cases have occurred in the no-reference image and video quality assessment literature.
- Score: 6.746400031322727
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data used to train supervised machine learning models are commonly split into
independent training, validation, and test sets. This paper illustrates that
complex data leakage cases have occurred in the no-reference image and video
quality assessment literature. Recently, papers in several journals reported
performance results well above the best in the field. However, our analysis
shows that information from the test set was inappropriately used in the
training process in different ways and that the claimed performance results
cannot be achieved. When correcting for the data leakage, the performances of
the approaches drop even below the state-of-the-art by a large margin.
Additionally, we investigate end-to-end variations to the discussed approaches,
which do not improve upon the original.
Related papers
- A Survey on Deep Learning-based Gaze Direction Regression: Searching for the State-of-the-art [0.0]
We present a survey of deep learning-based methods for the regression of gaze direction vector from head and eye images.
We describe in detail numerous published methods with a focus on the input data, architecture of the model, and loss function used to supervise the model.
We present a list of datasets that can be used to train and evaluate gaze direction regression methods.
arXiv Detail & Related papers (2024-10-22T15:07:07Z) - Early-Stage Anomaly Detection: A Study of Model Performance on Complete vs. Partial Flows [0.0]
This study investigates the efficacy of machine learning models, specifically Random Forest, in anomaly detection systems.
We explore the performance disparity that arises when models are applied to incomplete data typical in real-world, real-time network environments.
arXiv Detail & Related papers (2024-07-03T07:14:25Z) - Too Good To Be True: performance overestimation in (re)current practices
for Human Activity Recognition [49.1574468325115]
sliding windows for data segmentation followed by standard random k-fold cross validation produce biased results.
It is important to raise awareness in the scientific community about this problem, whose negative effects are being overlooked.
Several experiments with different types of datasets and different types of classification models allow us to exhibit the problem and show it persists independently of the method or dataset.
arXiv Detail & Related papers (2023-10-18T13:24:05Z) - Analyzing Dataset Annotation Quality Management in the Wild [63.07224587146207]
Even popular datasets used to train and evaluate state-of-the-art models contain a non-negligible amount of erroneous annotations, biases, or artifacts.
While practices and guidelines regarding dataset creation projects exist, large-scale analysis has yet to be performed on how quality management is conducted.
arXiv Detail & Related papers (2023-07-16T21:22:40Z) - A Pretrainer's Guide to Training Data: Measuring the Effects of Data
Age, Domain Coverage, Quality, & Toxicity [84.6421260559093]
This study is the largest set of experiments to validate, quantify, and expose undocumented intuitions about text pretraining.
Our findings indicate there does not exist a one-size-fits-all solution to filtering training data.
arXiv Detail & Related papers (2023-05-22T15:57:53Z) - Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem.
We examine the performance of various debiasing methods across multiple tasks.
We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z) - Artifact-Based Domain Generalization of Skin Lesion Models [20.792979998188848]
We propose a pipeline that relies on artifacts annotation to enable generalization evaluation and debiasing.
We create environments based on skin lesion artifacts to enable domain generalization methods.
Our results raise a concern that debiasing models towards a single aspect may not be enough for fair skin lesion analysis.
arXiv Detail & Related papers (2022-08-20T22:25:09Z) - Bias-Aware Loss for Training Image and Speech Quality Prediction Models
from Multiple Datasets [13.132388683797503]
We propose a bias-aware loss function that estimates each dataset's biases during training with a linear function.
We prove the efficiency of the proposed method by training and validating quality prediction models on synthetic and subjective image and speech quality datasets.
arXiv Detail & Related papers (2021-04-20T19:20:11Z) - Hidden Biases in Unreliable News Detection Datasets [60.71991809782698]
We show that selection bias during data collection leads to undesired artifacts in the datasets.
We observed a significant drop (>10%) in accuracy for all models tested in a clean split with no train/test source overlap.
We suggest future dataset creation include a simple model as a difficulty/bias probe and future model development use a clean non-overlapping site and date split.
arXiv Detail & Related papers (2021-04-20T17:16:41Z) - How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance.
We formulate a quality measure for the data set, which we refer to as $rho$-gap.
We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z) - Comment on "No-Reference Video Quality Assessment Based on the Temporal
Pooling of Deep Features" [6.746400031322727]
In Neural Processing Letters 50,3 a machine learning approach to blind video quality assessment was proposed.
It is based on temporal pooling of features of video frames, taken from the last pooling layer of deep convolutional neural networks.
The method was validated on two established benchmark datasets and gave results far better than the previous state-of-the-art.
We show that the originally reported wrong performance results are a consequence of two cases of data leakage.
arXiv Detail & Related papers (2020-05-09T09:28:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.