Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection
- URL: http://arxiv.org/abs/2211.09858v1
- Date: Thu, 17 Nov 2022 19:34:59 GMT
- Title: Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection
- Authors: Jianwei Zhang, Julie Liss, Suren Jayasuriya, and Visar Berisha
- Abstract summary: We propose a deep learning framework for generating acoustic feature embeddings sensitive to vocal quality.
A contrastive loss is combined with a classification loss to train our deep learning model jointly.
Empirical results demonstrate that our method achieves high in-corpus and cross-corpus classification accuracy.
- Score: 22.413475757518682
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Approximately 1.2% of the world's population has impaired voice production.
As a result, automatic dysphonic voice detection has attracted considerable
academic and clinical interest. However, existing methods for automated voice
assessment often fail to generalize outside the training conditions or to other
related applications. In this paper, we propose a deep learning framework for
generating acoustic feature embeddings sensitive to vocal quality and robust
across different corpora. A contrastive loss is combined with a classification
loss to train our deep learning model jointly. Data warping methods are used on
input voice samples to improve the robustness of our method. Empirical results
demonstrate that our method not only achieves high in-corpus and cross-corpus
classification accuracy but also generates good embeddings sensitive to voice
quality and robust across different corpora. We also compare our results
against three baseline methods on clean and three variations of deteriorated
in-corpus and cross-corpus datasets and demonstrate that the proposed model
consistently outperforms the baseline methods.
Related papers
- Voice Disorder Analysis: a Transformer-based Approach [10.003909936239742]
This paper proposes a novel solution that adopts transformers directly working on raw voice signals.
We consider many recording types at the same time, such as sentence reading and sustained vowel emission.
The experimental results, obtained on both public and private datasets, show the effectiveness of our solution in the disorder detection and classification tasks.
arXiv Detail & Related papers (2024-06-20T19:29:04Z) - Lightly Weighted Automatic Audio Parameter Extraction for the Quality
Assessment of Consensus Auditory-Perceptual Evaluation of Voice [18.8222742272435]
The proposed method utilizes age, sex, and five audio parameters: jitter, absolute jitter, shimmer, harmonic-to-noise ratio (HNR), and zero crossing.
The result reveals that our approach performs similar to state-of-the-art (SOTA) methods, and outperforms the latent representation obtained by using popular audio pre-trained models.
arXiv Detail & Related papers (2023-11-27T07:19:22Z) - Combating Label Noise With A General Surrogate Model For Sample
Selection [84.61367781175984]
We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically.
We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
arXiv Detail & Related papers (2023-10-16T14:43:27Z) - Robust Medical Image Classification from Noisy Labeled Data with Global
and Local Representation Guided Co-training [73.60883490436956]
We propose a novel collaborative training paradigm with global and local representation learning for robust medical image classification.
We employ the self-ensemble model with a noisy label filter to efficiently select the clean and noisy samples.
We also design a novel global and local representation learning scheme to implicitly regularize the networks to utilize noisy samples.
arXiv Detail & Related papers (2022-05-10T07:50:08Z) - On monoaural speech enhancement for automatic recognition of real noisy
speech using mixture invariant training [33.79711018198589]
We extend the existing mixture invariant training criterion to exploit both unpaired clean speech and real noisy data.
It is found that the unpaired clean speech is crucial to improve quality of separated speech from real noisy speech.
The proposed method also performs remixing of processed and unprocessed signals to alleviate the processing artifacts.
arXiv Detail & Related papers (2022-05-03T19:37:58Z) - Scenario Aware Speech Recognition: Advancements for Apollo Fearless
Steps & CHiME-4 Corpora [70.46867541361982]
We consider a general non-semantic speech representation, which is trained with a self-supervised criteria based on triplet loss called TRILL.
We observe +5.42% and +3.18% relative WER improvement for the development and evaluation sets of Fearless Steps.
arXiv Detail & Related papers (2021-09-23T00:43:32Z) - Adversarial attacks on audio source separation [26.717340178640498]
We reformulate various adversarial attack methods for the audio source separation problem.
We propose a simple yet effective regularization method to obtain imperceptible adversarial noise.
We also show the robustness of source separation models against a black-box attack.
arXiv Detail & Related papers (2020-10-07T05:02:21Z) - Exploration of Audio Quality Assessment and Anomaly Localisation Using
Attention Models [37.60722440434528]
In this paper, a novel model for audio quality assessment is proposed by jointly using bidirectional long short-term memory and an attention mechanism.
The former is to mimic a human auditory perception ability to learn information from a recording, and the latter is to further discriminate interferences from desired signals by highlighting target related features.
To evaluate our proposed approach, the TIMIT dataset is used and augmented by mixing with various natural sounds.
arXiv Detail & Related papers (2020-05-16T17:54:07Z) - Audio Impairment Recognition Using a Correlation-Based Feature
Representation [85.08880949780894]
We propose a new representation of hand-crafted features that is based on the correlation of feature pairs.
We show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage.
arXiv Detail & Related papers (2020-03-22T13:34:37Z) - Rectified Meta-Learning from Noisy Labels for Robust Image-based Plant
Disease Diagnosis [64.82680813427054]
Plant diseases serve as one of main threats to food security and crop production.
One popular approach is to transform this problem as a leaf image classification task, which can be addressed by the powerful convolutional neural networks (CNNs)
We propose a novel framework that incorporates rectified meta-learning module into common CNN paradigm to train a noise-robust deep network without using extra supervision information.
arXiv Detail & Related papers (2020-03-17T09:51:30Z) - Learning Not to Learn in the Presence of Noisy Labels [104.7655376309784]
We show that a new class of loss functions called the gambler's loss provides strong robustness to label noise across various levels of corruption.
We show that training with this loss function encourages the model to "abstain" from learning on the data points with noisy labels.
arXiv Detail & Related papers (2020-02-16T09:12:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.