Evaluating generative audio systems and their metrics
- URL: http://arxiv.org/abs/2209.00130v1
- Date: Wed, 31 Aug 2022 21:48:34 GMT
- Title: Evaluating generative audio systems and their metrics
- Authors: Ashvala Vinay, Alexander Lerch
- Abstract summary: This paper investigates state-of-the-art approaches side-by-side with (i) a set of previously proposed objective metrics for audio reconstruction, and (ii) a listening study.
Results indicate that currently used objective metrics are insufficient to describe the perceptual quality of current systems.
- Score: 80.97828572629093
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent years have seen considerable advances in audio synthesis with deep
generative models. However, the state-of-the-art is very difficult to quantify;
different studies often use different evaluation methodologies and different
metrics when reporting results, making a direct comparison to other systems
difficult if not impossible. Furthermore, the perceptual relevance and meaning
of the reported metrics in most cases unknown, prohibiting any conclusive
insights with respect to practical usability and audio quality. This paper
presents a study that investigates state-of-the-art approaches side-by-side
with (i) a set of previously proposed objective metrics for audio
reconstruction, and with (ii) a listening study. The results indicate that
currently used objective metrics are insufficient to describe the perceptual
quality of current systems.
Related papers
- What You Hear Is What You See: Audio Quality Metrics From Image Quality
Metrics [44.659718609385315]
We investigate the feasibility of utilizing state-of-the-art image perceptual metrics for evaluating audio signals by representing them as spectrograms.
We customise one of the metrics which has a psychoacoustically plausible architecture to account for the peculiarities of sound signals.
We evaluate the effectiveness of our proposed metric and several baseline metrics using a music dataset.
arXiv Detail & Related papers (2023-05-19T10:43:57Z) - In Search of Insights, Not Magic Bullets: Towards Demystification of the
Model Selection Dilemma in Heterogeneous Treatment Effect Estimation [92.51773744318119]
This paper empirically investigates the strengths and weaknesses of different model selection criteria.
We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
arXiv Detail & Related papers (2023-02-06T16:55:37Z) - Benchmarking common uncertainty estimation methods with
histopathological images under domain shift and label noise [62.997667081978825]
In high-risk environments, deep learning models need to be able to judge their uncertainty and reject inputs when there is a significant chance of misclassification.
We conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole Slide Images.
We observe that ensembles of methods generally lead to better uncertainty estimates as well as an increased robustness towards domain shifts and label noise.
arXiv Detail & Related papers (2023-01-03T11:34:36Z) - Ontology-aware Learning and Evaluation for Audio Tagging [56.59107110017436]
Mean average precision (mAP) metric treats different kinds of sound as independent classes without considering their relations.
Ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation.
We conduct human evaluations and demonstrate that OmAP is more consistent with human perception than mAP.
arXiv Detail & Related papers (2022-11-22T11:35:14Z) - A Study on the Evaluation of Generative Models [19.18642459565609]
Implicit generative models, which do not return likelihood values, have become prevalent in recent years.
In this work, we study the evaluation metrics of generative models by generating a high-quality synthetic dataset.
Our study shows that while FID and IS do correlate to several f-divergences, their ranking of close models can vary considerably.
arXiv Detail & Related papers (2022-06-22T09:27:31Z) - A Comparative Study of Faithfulness Metrics for Model Interpretability
Methods [3.7200349581269996]
We introduce two assessment dimensions, namely diagnosticity and time complexity.
According to the experimental results, we find that sufficiency and comprehensiveness metrics have higher diagnosticity and lower time complexity than the other faithfulness metric.
arXiv Detail & Related papers (2022-04-12T04:02:17Z) - Investigation of Different Calibration Methods for Deep Speaker
Embedding based Verification Systems [66.61691401921296]
This paper presents an investigation over several methods of score calibration for deep speaker embedding extractors.
An additional focus of this research is to estimate the impact of score normalization on the calibration performance of the system.
arXiv Detail & Related papers (2022-03-28T21:22:22Z) - Exploration of Audio Quality Assessment and Anomaly Localisation Using
Attention Models [37.60722440434528]
In this paper, a novel model for audio quality assessment is proposed by jointly using bidirectional long short-term memory and an attention mechanism.
The former is to mimic a human auditory perception ability to learn information from a recording, and the latter is to further discriminate interferences from desired signals by highlighting target related features.
To evaluate our proposed approach, the TIMIT dataset is used and augmented by mixing with various natural sounds.
arXiv Detail & Related papers (2020-05-16T17:54:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.