What You Hear Is What You See: Audio Quality Metrics From Image Quality
Metrics
- URL: http://arxiv.org/abs/2305.11582v2
- Date: Wed, 30 Aug 2023 16:06:27 GMT
- Title: What You Hear Is What You See: Audio Quality Metrics From Image Quality
Metrics
- Authors: Tashi Namgyal, Alexander Hepburn, Raul Santos-Rodriguez, Valero
Laparra, Jesus Malo
- Abstract summary: We investigate the feasibility of utilizing state-of-the-art image perceptual metrics for evaluating audio signals by representing them as spectrograms.
We customise one of the metrics which has a psychoacoustically plausible architecture to account for the peculiarities of sound signals.
We evaluate the effectiveness of our proposed metric and several baseline metrics using a music dataset.
- Score: 44.659718609385315
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this study, we investigate the feasibility of utilizing state-of-the-art
image perceptual metrics for evaluating audio signals by representing them as
spectrograms. The encouraging outcome of the proposed approach is based on the
similarity between the neural mechanisms in the auditory and visual pathways.
Furthermore, we customise one of the metrics which has a psychoacoustically
plausible architecture to account for the peculiarities of sound signals. We
evaluate the effectiveness of our proposed metric and several baseline metrics
using a music dataset, with promising results in terms of the correlation
between the metrics and the perceived quality of audio as rated by human
evaluators.
Related papers
- Challenge on Sound Scene Synthesis: Evaluating Text-to-Audio Generation [8.170174172545831]
This paper addresses issues through the Sound Scene Synthesis challenge held as part of the Detection and Classification of Acoustic Scenes and Events 2024.
We present an evaluation protocol combining objective metric, namely Fr'echet Audio Distance, with perceptual assessments, utilizing a structured prompt format to enable diverse captions and effective evaluation.
arXiv Detail & Related papers (2024-10-23T06:35:41Z) - Lightly Weighted Automatic Audio Parameter Extraction for the Quality
Assessment of Consensus Auditory-Perceptual Evaluation of Voice [18.8222742272435]
The proposed method utilizes age, sex, and five audio parameters: jitter, absolute jitter, shimmer, harmonic-to-noise ratio (HNR), and zero crossing.
The result reveals that our approach performs similar to state-of-the-art (SOTA) methods, and outperforms the latent representation obtained by using popular audio pre-trained models.
arXiv Detail & Related papers (2023-11-27T07:19:22Z) - Show from Tell: Audio-Visual Modelling in Clinical Settings [58.88175583465277]
We consider audio-visual modelling in a clinical setting, providing a solution to learn medical representations without human expert annotation.
A simple yet effective multi-modal self-supervised learning framework is proposed for this purpose.
The proposed approach is able to localise anatomical regions of interest during ultrasound imaging, with only speech audio as a reference.
arXiv Detail & Related papers (2023-10-25T08:55:48Z) - Self-Supervised Visual Acoustic Matching [63.492168778869726]
Acoustic matching aims to re-synthesize an audio clip to sound as if it were recorded in a target acoustic environment.
We propose a self-supervised approach to visual acoustic matching where training samples include only the target scene image and audio.
Our approach jointly learns to disentangle room acoustics and re-synthesize audio into the target environment, via a conditional GAN framework and a novel metric.
arXiv Detail & Related papers (2023-07-27T17:59:59Z) - Evaluating generative audio systems and their metrics [80.97828572629093]
This paper investigates state-of-the-art approaches side-by-side with (i) a set of previously proposed objective metrics for audio reconstruction, and (ii) a listening study.
Results indicate that currently used objective metrics are insufficient to describe the perceptual quality of current systems.
arXiv Detail & Related papers (2022-08-31T21:48:34Z) - End-to-End Binaural Speech Synthesis [71.1869877389535]
We present an end-to-end speech synthesis system that combines a low-bitrate audio system with a powerful decoder.
We demonstrate the capability of the adversarial loss in capturing environment effects needed to create an authentic auditory scene.
arXiv Detail & Related papers (2022-07-08T05:18:36Z) - Data Fusion for Audiovisual Speaker Localization: Extending Dynamic
Stream Weights to the Spatial Domain [103.3388198420822]
Esting the positions of multiple speakers can be helpful for tasks like automatic speech recognition or speaker diarization.
This paper proposes a novel audiovisual data fusion framework for speaker localization by assigning individual dynamic stream weights to specific regions.
A performance evaluation using audiovisual recordings yields promising results, with the proposed fusion approach outperforming all baseline models.
arXiv Detail & Related papers (2021-02-23T09:59:31Z) - Exploration of Audio Quality Assessment and Anomaly Localisation Using
Attention Models [37.60722440434528]
In this paper, a novel model for audio quality assessment is proposed by jointly using bidirectional long short-term memory and an attention mechanism.
The former is to mimic a human auditory perception ability to learn information from a recording, and the latter is to further discriminate interferences from desired signals by highlighting target related features.
To evaluate our proposed approach, the TIMIT dataset is used and augmented by mixing with various natural sounds.
arXiv Detail & Related papers (2020-05-16T17:54:07Z) - How deep is your encoder: an analysis of features descriptors for an
autoencoder-based audio-visual quality metric [2.191505742658975]
The No-Reference Audio-Visual Quality Metric Based on a Deep Autoencoder (NAViDAd) deals with this problem from a machine learning perspective.
A basic implementation of NAViDAd was able to produce accurate predictions tested with a range of different audio-visual databases.
arXiv Detail & Related papers (2020-03-24T20:15:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.