Lightly Weighted Automatic Audio Parameter Extraction for the Quality
Assessment of Consensus Auditory-Perceptual Evaluation of Voice
- URL: http://arxiv.org/abs/2311.15582v1
- Date: Mon, 27 Nov 2023 07:19:22 GMT
- Title: Lightly Weighted Automatic Audio Parameter Extraction for the Quality
Assessment of Consensus Auditory-Perceptual Evaluation of Voice
- Authors: Yi-Heng Lin, Wen-Hsuan Tseng, Li-Chin Chen, Ching-Ting Tan, Yu Tsao
- Abstract summary: The proposed method utilizes age, sex, and five audio parameters: jitter, absolute jitter, shimmer, harmonic-to-noise ratio (HNR), and zero crossing.
The result reveals that our approach performs similar to state-of-the-art (SOTA) methods, and outperforms the latent representation obtained by using popular audio pre-trained models.
- Score: 18.8222742272435
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Consensus Auditory-Perceptual Evaluation of Voice is a widely employed
tool in clinical voice quality assessment that is significant for streaming
communication among clinical professionals and benchmarking for the
determination of further treatment. Currently, because the assessment relies on
experienced clinicians, it tends to be inconsistent, and thus, difficult to
standardize. To address this problem, we propose to leverage lightly weighted
automatic audio parameter extraction, to increase the clinical relevance,
reduce the complexity, and enhance the interpretability of voice quality
assessment. The proposed method utilizes age, sex, and five audio parameters:
jitter, absolute jitter, shimmer, harmonic-to-noise ratio (HNR), and zero
crossing. A classical machine learning approach is employed. The result reveals
that our approach performs similar to state-of-the-art (SOTA) methods, and
outperforms the latent representation obtained by using popular audio
pre-trained models. This approach provide insights into the feasibility of
different feature extraction approaches for voice evaluation. Audio parameters
such as jitter and the HNR are proven to be suitable for characterizing voice
quality attributes, such as roughness and strain. Conversely, pre-trained
models exhibit limitations in effectively addressing noise-related scorings.
This study contributes toward more comprehensive and precise voice quality
evaluations, achieved by a comprehensively exploring diverse assessment
methodologies.
Related papers
- CLAIR-A: Leveraging Large Language Models to Judge Audio Captions [73.51087998971418]
evaluating machine-generated audio captions is a complex task that requires considering diverse factors.
We propose CLAIR-A, a simple and flexible method that leverages the zero-shot capabilities of large language models.
In our evaluations, CLAIR-A better predicts human judgements of quality compared to traditional metrics.
arXiv Detail & Related papers (2024-09-19T17:59:52Z) - Developing vocal system impaired patient-aimed voice quality assessment approach using ASR representation-included multiple features [0.4681310436826459]
This article showcases the utilization of automatic speech recognition and self-supervised learning representations, pre-trained on extensive datasets of normal speech.
Experiments involve checks on PVQD dataset, covering various causes of vocal system damage in English, and a Japanese dataset focusing on patients with Parkinson's disease.
The results on PVQD reveal a notable correlation (>0.8 on PCC) and an extraordinary accuracy (0.5 on MSE) in predicting Grade, Breathy, and Asthenic indicators.
arXiv Detail & Related papers (2024-08-22T10:22:53Z) - A Comprehensive Rubric for Annotating Pathological Speech [0.0]
We introduce a comprehensive rubric based on various dimensions of speech quality, including phonetics, fluency, and prosody.
The objective is to establish standardized criteria for identifying errors within the speech of individuals with Down syndrome.
arXiv Detail & Related papers (2024-04-29T16:44:27Z) - Show from Tell: Audio-Visual Modelling in Clinical Settings [58.88175583465277]
We consider audio-visual modelling in a clinical setting, providing a solution to learn medical representations without human expert annotation.
A simple yet effective multi-modal self-supervised learning framework is proposed for this purpose.
The proposed approach is able to localise anatomical regions of interest during ultrasound imaging, with only speech audio as a reference.
arXiv Detail & Related papers (2023-10-25T08:55:48Z) - Ontology-aware Learning and Evaluation for Audio Tagging [56.59107110017436]
Mean average precision (mAP) metric treats different kinds of sound as independent classes without considering their relations.
Ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation.
We conduct human evaluations and demonstrate that OmAP is more consistent with human perception than mAP.
arXiv Detail & Related papers (2022-11-22T11:35:14Z) - Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection [22.413475757518682]
We propose a deep learning framework for generating acoustic feature embeddings sensitive to vocal quality.
A contrastive loss is combined with a classification loss to train our deep learning model jointly.
Empirical results demonstrate that our method achieves high in-corpus and cross-corpus classification accuracy.
arXiv Detail & Related papers (2022-11-17T19:34:59Z) - Evaluating generative audio systems and their metrics [80.97828572629093]
This paper investigates state-of-the-art approaches side-by-side with (i) a set of previously proposed objective metrics for audio reconstruction, and (ii) a listening study.
Results indicate that currently used objective metrics are insufficient to describe the perceptual quality of current systems.
arXiv Detail & Related papers (2022-08-31T21:48:34Z) - DHASP: Differentiable Hearing Aid Speech Processing [23.101074347473904]
An appropriate amplification fitting for the listener's hearing disability is critical for good performance.
In this paper, we introduce an alternative approach to finding the optimal fitting by introducing a hearing aid speech processing framework.
The framework is fully differentiable, thus can employ the back-propagation algorithm for efficient, data-driven optimisation.
Our initial objective experiments show promising results for noise-free speech amplification, where the automatically optimised processors outperform one of the well recognised hearing aid prescriptions.
arXiv Detail & Related papers (2021-03-15T17:34:22Z) - Exploration of Audio Quality Assessment and Anomaly Localisation Using
Attention Models [37.60722440434528]
In this paper, a novel model for audio quality assessment is proposed by jointly using bidirectional long short-term memory and an attention mechanism.
The former is to mimic a human auditory perception ability to learn information from a recording, and the latter is to further discriminate interferences from desired signals by highlighting target related features.
To evaluate our proposed approach, the TIMIT dataset is used and augmented by mixing with various natural sounds.
arXiv Detail & Related papers (2020-05-16T17:54:07Z) - Bulbar ALS Detection Based on Analysis of Voice Perturbation and Vibrato [68.97335984455059]
The purpose of this work was to verify the sutability of the sustain vowel phonation test for automatic detection of patients with ALS.
We proposed enhanced procedure for separation of voice signal into fundamental periods that requires for calculation of measurements.
arXiv Detail & Related papers (2020-03-24T12:49:25Z) - Opportunities of a Machine Learning-based Decision Support System for
Stroke Rehabilitation Assessment [64.52563354823711]
Rehabilitation assessment is critical to determine an adequate intervention for a patient.
Current practices of assessment mainly rely on therapist's experience, and assessment is infrequently executed due to the limited availability of a therapist.
We developed an intelligent decision support system that can identify salient features of assessment using reinforcement learning.
arXiv Detail & Related papers (2020-02-27T17:04:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.