Learning with Noisy Low-Cost MOS for Image Quality Assessment via
Dual-Bias Calibration
- URL: http://arxiv.org/abs/2311.15846v1
- Date: Mon, 27 Nov 2023 14:11:54 GMT
- Title: Learning with Noisy Low-Cost MOS for Image Quality Assessment via
Dual-Bias Calibration
- Authors: Lei Wang, Qingbo Wu, Desen Yuan, King Ngi Ngan, Hongliang Li, Fanman
Meng, and Linfeng Xu
- Abstract summary: In view of the subjective bias of individual annotators, the labor-abundant mean opinion score (LA-MOS) typically requires a large collection of opinion scores from multiple annotators for each image.
In this paper, we aim to learn robust IQA models from low-cost MOS, which only requires very few opinion scores or even a single opinion score for each image.
To the best of our knowledge, this is the first exploration of robust IQA model learning from noisy low-cost labels.
- Score: 20.671990508960906
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning based image quality assessment (IQA) models have obtained impressive
performance with the help of reliable subjective quality labels, where mean
opinion score (MOS) is the most popular choice. However, in view of the
subjective bias of individual annotators, the labor-abundant MOS (LA-MOS)
typically requires a large collection of opinion scores from multiple
annotators for each image, which significantly increases the learning cost. In
this paper, we aim to learn robust IQA models from low-cost MOS (LC-MOS), which
only requires very few opinion scores or even a single opinion score for each
image. More specifically, we consider the LC-MOS as the noisy observation of
LA-MOS and enforce the IQA model learned from LC-MOS to approach the unbiased
estimation of LA-MOS. In this way, we represent the subjective bias between
LC-MOS and LA-MOS, and the model bias between IQA predictions learned from
LC-MOS and LA-MOS (i.e., dual-bias) as two latent variables with unknown
parameters. By means of the expectation-maximization based alternating
optimization, we can jointly estimate the parameters of the dual-bias, which
suppresses the misleading of LC-MOS via a gated dual-bias calibration (GDBC)
module. To the best of our knowledge, this is the first exploration of robust
IQA model learning from noisy low-cost labels. Theoretical analysis and
extensive experiments on four popular IQA datasets show that the proposed
method is robust toward different bias rates and annotation numbers and
significantly outperforms the other learning based IQA models when only LC-MOS
is available. Furthermore, we also achieve comparable performance with respect
to the other models learned with LA-MOS.
Related papers
- LLAVADI: What Matters For Multimodal Large Language Models Distillation [77.73964744238519]
In this work, we do not propose a new efficient model structure or train small-scale MLLMs from scratch.
Our studies involve training strategies, model choices, and distillation algorithms in the knowledge distillation process.
By evaluating different benchmarks and proper strategy, even a 2.7B small-scale model can perform on par with larger models with 7B or 13B parameters.
arXiv Detail & Related papers (2024-07-28T06:10:47Z) - Perceptual Constancy Constrained Single Opinion Score Calibration for Image Quality Assessment [2.290956583394892]
We propose a highly efficient method to estimate an image's mean opinion score (MOS) from a single opinion score (SOS)
Experiments show that the proposed method is efficient in calibrating the biased SOS and significantly improves IQA model learning when only SOSs are available.
arXiv Detail & Related papers (2024-04-30T14:42:55Z) - Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score.
Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score.
Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z) - MOSPC: MOS Prediction Based on Pairwise Comparison [32.55704173124071]
Mean opinion score (MOS) is a subjective metric to evaluate the quality of synthesized speech.
We propose a general framework for MOS prediction based on pair comparison (MOSPC)
Our framework surpasses the strong baseline in ranking accuracy on each fine-grained segment.
arXiv Detail & Related papers (2023-06-18T07:38:17Z) - Benchmarking Large Language Models for News Summarization [79.37850439866938]
Large language models (LLMs) have shown promise for automatic summarization but the reasons behind their successes are poorly understood.
We find instruction tuning, and not model size, is the key to the LLM's zero-shot summarization capability.
arXiv Detail & Related papers (2023-01-31T18:46:19Z) - Speech MOS multi-task learning and rater bias correction [10.123346550775471]
Mean opinion score (MOS) is standardized for the perceptual evaluation of speech quality and is obtained by asking listeners to rate the quality of a speech sample.
Here we propose a multi-task framework to include additional labels and data in training to improve the performance of a blind MOS estimation model.
arXiv Detail & Related papers (2022-12-04T20:06:27Z) - Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using
Prosodic and Linguistic Features [54.48824266041105]
Current state-of-the-art methods for automatic synthetic speech evaluation are based on MOS prediction neural models.
We propose to include prosodic and linguistic features as additional inputs in MOS prediction systems.
All MOS prediction systems are trained on SOMOS, a neural TTS-only dataset with crowdsourced naturalness MOS evaluations.
arXiv Detail & Related papers (2022-11-01T09:18:50Z) - Conformer and Blind Noisy Students for Improved Image Quality Assessment [80.57006406834466]
Learning-based approaches for perceptual image quality assessment (IQA) usually require both the distorted and reference image for measuring the perceptual quality accurately.
In this work, we explore the performance of transformer-based full-reference IQA models.
We also propose a method for IQA based on semi-supervised knowledge distillation from full-reference teacher models into blind student models.
arXiv Detail & Related papers (2022-04-27T10:21:08Z) - Improving Self-Supervised Learning-based MOS Prediction Networks [0.0]
The present work introduces data-, training- and post-training specific improvements to a previous self-supervised learning-based MOS prediction model.
We used a wav2vec 2.0 model pre-trained on LibriSpeech, extended with LSTM and non-linear dense layers.
The methods are evaluated using the shared synthetic speech dataset of the first Voice MOS challenge.
arXiv Detail & Related papers (2022-04-23T09:19:16Z) - Neural MOS Prediction for Synthesized Speech Using Multi-Task Learning
With Spoofing Detection and Spoofing Type Classification [16.43844160498413]
We propose a multi-task learning (MTL) method to improve the performance of a MOS prediction model.
Experiments using the Voice Conversion Challenge 2018 show that proposed MTL with two auxiliary tasks improves MOS prediction.
arXiv Detail & Related papers (2020-07-16T11:38:08Z) - AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering.
The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch.
The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level.
The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.