A Pyramid Recurrent Network for Predicting Crowdsourced Speech-Quality
Ratings of Real-World Signals
- URL: http://arxiv.org/abs/2007.15797v1
- Date: Fri, 31 Jul 2020 01:46:06 GMT
- Title: A Pyramid Recurrent Network for Predicting Crowdsourced Speech-Quality
Ratings of Real-World Signals
- Authors: Xuan Dong and Donald S. Williamson
- Abstract summary: We collect and predict the perceptual quality of real-world speech signals evaluated by human listeners.
We develop a novel approach that predicts human quality ratings using a pyramid bidirectional long short term memory (pBLSTM) network with an attention mechanism.
- Score: 22.49276680317304
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The real-world capabilities of objective speech quality measures are limited
since current measures (1) are developed from simulated data that does not
adequately model real environments; or they (2) predict objective scores that
are not always strongly correlated with subjective ratings. Additionally, a
large dataset of real-world signals with listener quality ratings does not
currently exist, which would help facilitate real-world assessment. In this
paper, we collect and predict the perceptual quality of real-world speech
signals that are evaluated by human listeners. We first collect a large quality
rating dataset by conducting crowdsourced listening studies on two real-world
corpora. We further develop a novel approach that predicts human quality
ratings using a pyramid bidirectional long short term memory (pBLSTM) network
with an attention mechanism. The results show that the proposed model achieves
statistically lower estimation errors than prior assessment approaches, where
the predicted scores strongly correlate with human judgments.
Related papers
- Sample Complexity of Preference-Based Nonparametric Off-Policy
Evaluation with Deep Networks [58.469818546042696]
We study the sample efficiency of OPE with human preference and establish a statistical guarantee for it.
By appropriately selecting the size of a ReLU network, we show that one can leverage any low-dimensional manifold structure in the Markov decision process.
arXiv Detail & Related papers (2023-10-16T16:27:06Z) - Bring Your Own Data! Self-Supervised Evaluation for Large Language
Models [52.15056231665816]
We propose a framework for self-supervised evaluation of Large Language Models (LLMs)
We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence.
We find strong correlations between self-supervised and human-supervised evaluations.
arXiv Detail & Related papers (2023-06-23T17:59:09Z) - GREAT Score: Global Robustness Evaluation of Adversarial Perturbation
using Generative Models [74.43215520371506]
We present a new framework, called GREAT Score, for global robustness evaluation of adversarial perturbation using generative models.
We show high correlation and significantly reduced cost of GREAT Score when compared to the attack-based model ranking on RobustBench.
GREAT Score can be used for remote auditing of privacy-sensitive black-box models.
arXiv Detail & Related papers (2023-04-19T14:58:27Z) - Revisiting the Gold Standard: Grounding Summarization Evaluation with
Robust Human Evaluation [136.16507050034755]
Existing human evaluation studies for summarization either exhibit a low inter-annotator agreement or have insufficient scale.
We propose a modified summarization salience protocol, Atomic Content Units (ACUs), which is based on fine-grained semantic units.
We curate the Robust Summarization Evaluation (RoSE) benchmark, a large human evaluation dataset consisting of 22,000 summary-level annotations over 28 top-performing systems.
arXiv Detail & Related papers (2022-12-15T17:26:05Z) - Ontology-aware Learning and Evaluation for Audio Tagging [56.59107110017436]
Mean average precision (mAP) metric treats different kinds of sound as independent classes without considering their relations.
Ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation.
We conduct human evaluations and demonstrate that OmAP is more consistent with human perception than mAP.
arXiv Detail & Related papers (2022-11-22T11:35:14Z) - CCATMos: Convolutional Context-aware Transformer Network for
Non-intrusive Speech Quality Assessment [12.497279501767606]
We propose a novel end-to-end model structure called Convolutional Context-Aware Transformer (CCAT) network to predict the mean opinion score (MOS) of human raters.
We evaluate our model on three MOS-annotated datasets spanning multiple languages and distortion types and submit our results to the ConferencingSpeech 2022 Challenge.
arXiv Detail & Related papers (2022-11-04T16:46:11Z) - LDNet: Unified Listener Dependent Modeling in MOS Prediction for
Synthetic Speech [67.88748572167309]
We present LDNet, a unified framework for mean opinion score (MOS) prediction.
We propose two inference methods that provide more stable results and efficient computation.
arXiv Detail & Related papers (2021-10-18T08:52:31Z) - On the Evaluation of Generative Adversarial Networks By Discriminative
Models [0.0]
Generative Adversarial Networks (GANs) can accurately model complex multi-dimensional data and generate realistic samples.
The majority of research efforts associated with tackling this issue were validated by qualitative visual evaluation.
In this work, we leverage Siamese neural networks to propose a domain-agnostic evaluation metric.
arXiv Detail & Related papers (2020-10-07T17:50:39Z) - Exploration of Audio Quality Assessment and Anomaly Localisation Using
Attention Models [37.60722440434528]
In this paper, a novel model for audio quality assessment is proposed by jointly using bidirectional long short-term memory and an attention mechanism.
The former is to mimic a human auditory perception ability to learn information from a recording, and the latter is to further discriminate interferences from desired signals by highlighting target related features.
To evaluate our proposed approach, the TIMIT dataset is used and augmented by mixing with various natural sounds.
arXiv Detail & Related papers (2020-05-16T17:54:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.