Related papers: Robustness and accuracy of mean opinion scores with hard and soft outlier detection

Robustness and accuracy of mean opinion scores with hard and soft outlier detection

URL: http://arxiv.org/abs/2509.06554v1
Date: Mon, 08 Sep 2025 11:09:14 GMT
Title: Robustness and accuracy of mean opinion scores with hard and soft outlier detection
Authors: Dietmar Saupe, Tim Bleile,
Abstract summary: In subjective assessment of image and video quality, observers rate or compare selected stimuli.<n>It is recommended to identify and deal with outliers that may have given unreliable ratings.
Score: 2.9214619971854723
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In subjective assessment of image and video quality, observers rate or compare selected stimuli. Before calculating the mean opinion scores (MOS) for these stimuli from the ratings, it is recommended to identify and deal with outliers that may have given unreliable ratings. Several methods are available for this purpose, some of which have been standardized. These methods are typically based on statistics and sometimes tested by introducing synthetic ratings from artificial outliers, such as random clickers. However, a reliable and comprehensive approach is lacking for comparative performance analysis of outlier detection methods. To fill this gap, this work proposes and applies an empirical worst-case analysis as a general solution. Our method involves evolutionary optimization of an adversarial black-box attack on outlier detection algorithms, where the adversary maximizes the distortion of scale values with respect to ground truth. We apply our analysis to several hard and soft outlier detection methods for absolute category ratings and show their differing performance in this stress test. In addition, we propose two new outlier detection methods with low complexity and excellent worst-case performance. Software for adversarial attacks and data analysis is available.

Related papers

A method for outlier detection based on cluster analysis and visual expert criteria [0.0]
Outliers are the outcome of fraudulent behaviour, mechanical faults, human error, or simply natural deviations.<n>We propose an outlier detection method based on a clustering process.
arXiv Detail & Related papers (2025-10-27T09:16:16Z)
Fuzzy Granule Density-Based Outlier Detection with Multi-Scale Granular Balls [65.44462297594308]
Outlier detection refers to the identification of anomalous samples that deviate significantly from the distribution of normal data.<n>Most unsupervised outlier detection methods are carefully designed to detect specified outliers.<n>We propose a fuzzy rough sets-based multi-scale outlier detection method to identify various types of outliers.
arXiv Detail & Related papers (2025-01-06T12:35:51Z)
Regularized Contrastive Partial Multi-view Outlier Detection [76.77036536484114]
We propose a novel method named Regularized Contrastive Partial Multi-view Outlier Detection (RCPMOD) In this framework, we utilize contrastive learning to learn view-consistent information and distinguish outliers by the degree of consistency. Experimental results on four benchmark datasets demonstrate that our proposed approach could outperform state-of-the-art competitors.
arXiv Detail & Related papers (2024-08-02T14:34:27Z)
Preference Poisoning Attacks on Reward Model Learning [47.00395978031771]
We investigate the nature and extent of a vulnerability in learning reward models from pairwise comparisons. We propose two classes of algorithmic approaches for these attacks: a gradient-based framework, and several variants of rank-by-distance methods. We find that the best attacks are often highly successful, achieving in the most extreme case 100% success rate with only 0.3% of the data poisoned.
arXiv Detail & Related papers (2024-02-02T21:45:24Z)
On Pixel-level Performance Assessment in Anomaly Detection [87.7131059062292]
Anomaly detection methods have demonstrated remarkable success across various applications. However, assessing their performance, particularly at the pixel-level, presents a complex challenge. In this paper, we dissect the intricacies of this challenge, underscored by visual evidence and statistical analysis.
arXiv Detail & Related papers (2023-10-25T08:02:27Z)
Quantile-based Maximum Likelihood Training for Outlier Detection [5.902139925693801]
We introduce a quantile-based maximum likelihood objective for learning the inlier distribution to improve the outlier separation during inference. Our approach fits a normalizing flow to pre-trained discriminative features and detects the outliers according to the evaluated log-likelihood.
arXiv Detail & Related papers (2023-08-20T22:27:54Z)
On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection [55.73320979733527]
We propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs. Experimental results show that our method achieves competitive detection performance on various text classification tasks.
arXiv Detail & Related papers (2023-06-27T02:54:07Z)
Holistic Approach to Measure Sample-level Adversarial Vulnerability and its Utility in Building Trustworthy Systems [17.707594255626216]
Adversarial attack perturbs an image with an imperceptible noise, leading to incorrect model prediction. We propose a holistic approach for quantifying adversarial vulnerability of a sample by combining different perspectives. We demonstrate that by reliably estimating adversarial vulnerability at the sample level, it is possible to develop a trustworthy system.
arXiv Detail & Related papers (2022-05-05T12:36:17Z)
Performance Evaluation of Adversarial Attacks: Discrepancies and Solutions [51.8695223602729]
adversarial attack methods have been developed to challenge the robustness of machine learning models. We propose a Piece-wise Sampling Curving (PSC) toolkit to effectively address the discrepancy. PSC toolkit offers options for balancing the computational cost and evaluation effectiveness.
arXiv Detail & Related papers (2021-04-22T14:36:51Z)
A Statistical Analysis of Summarization Evaluation Metrics using Resampling Methods [60.04142561088524]
We find that the confidence intervals are rather wide, demonstrating high uncertainty in how reliable automatic metrics truly are. Although many metrics fail to show statistical improvements over ROUGE, two recent works, QAEval and BERTScore, do in some evaluation settings.
arXiv Detail & Related papers (2021-03-31T18:28:14Z)
ATRO: Adversarial Training with a Rejection Option [10.36668157679368]
This paper proposes a classification framework with a rejection option to mitigate the performance deterioration caused by adversarial examples. Applying the adversarial training objective to both a classifier and a rejection function simultaneously, we can choose to abstain from classification when it has insufficient confidence to classify a test data point.
arXiv Detail & Related papers (2020-10-24T14:05:03Z)
Rethinking Empirical Evaluation of Adversarial Robustness Using First-Order Attack Methods [6.531546527140473]
We identify three common cases that lead to overestimation of adversarial accuracy against bounded first-order attack methods. We propose compensation methods that address sources of inaccurate gradient computation. Overall, our work shows that overestimated adversarial accuracy that is not indicative of robustness is prevalent even for conventionally trained deep neural networks.
arXiv Detail & Related papers (2020-06-01T22:55:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.