On the Mutual Information between Source and Filter Contributions for
Voice Pathology Detection
- URL: http://arxiv.org/abs/2001.00583v1
- Date: Thu, 2 Jan 2020 10:04:37 GMT
- Title: On the Mutual Information between Source and Filter Contributions for
Voice Pathology Detection
- Authors: Thomas Drugman, Thomas Dubuisson, Thierry Dutoit
- Abstract summary: This paper addresses the problem of automatic detection of voice pathologies directly from the speech signal.
Three sets of features are proposed, depending on whether they are related to the speech or the glottal signal, or to prosody.
- Score: 11.481208551940998
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the problem of automatic detection of voice pathologies
directly from the speech signal. For this, we investigate the use of the
glottal source estimation as a means to detect voice disorders. Three sets of
features are proposed, depending on whether they are related to the speech or
the glottal signal, or to prosody. The relevancy of these features is assessed
through mutual information-based measures. This allows an intuitive
interpretation in terms of discrimation power and redundancy between the
features, independently of any subsequent classifier. It is discussed which
characteristics are interestingly informative or complementary for detecting
voice pathologies.
Related papers
- Disentangling segmental and prosodic factors to non-native speech comprehensibility [11.098498920630782]
Current accent conversion systems do not disentangle the two main sources of non-native accent: segmental and prosodic characteristics.
We present an AC system that decouples voice quality from accent, but also disentangles the latter into its segmental and prosodic characteristics.
We conduct perceptual listening tests to quantify the individual contributions of segmental features and prosody on the perceived comprehensibility of non-native speech.
arXiv Detail & Related papers (2024-08-20T16:43:55Z) - A Novel Labeled Human Voice Signal Dataset for Misbehavior Detection [0.7223352886780369]
This research highlights the significance of voice tone and delivery in automated machine-learning systems for voice analysis and recognition.
It contributes to the broader field of voice signal analysis by elucidating the impact of human behaviour on the perception and categorization of voice signals.
arXiv Detail & Related papers (2024-06-28T18:55:07Z) - Improving Speaker Diarization using Semantic Information: Joint Pairwise
Constraints Propagation [53.01238689626378]
We propose a novel approach to leverage semantic information in speaker diarization systems.
We introduce spoken language understanding modules to extract speaker-related semantic information.
We present a novel framework to integrate these constraints into the speaker diarization pipeline.
arXiv Detail & Related papers (2023-09-19T09:13:30Z) - Deep Spectro-temporal Artifacts for Detecting Synthesized Speech [57.42110898920759]
This paper provides an overall assessment of track 1 (Low-quality Fake Audio Detection) and track 2 (Partially Fake Audio Detection)
In this paper, spectro-temporal artifacts were detected using raw temporal signals, spectral features, as well as deep embedding features.
We ranked 4th and 5th in track 1 and track 2, respectively.
arXiv Detail & Related papers (2022-10-11T08:31:30Z) - Towards Disentangled Speech Representations [65.7834494783044]
We construct a representation learning task based on joint modeling of ASR and TTS.
We seek to learn a representation of audio that disentangles that part of the speech signal that is relevant to transcription from that part which is not.
We show that enforcing these properties during training improves WER by 24.5% relative on average for our joint modeling task.
arXiv Detail & Related papers (2022-08-28T10:03:55Z) - Integration of Text and Graph-based Features for Detecting Mental Health
Disorders from Voice [1.5469452301122175]
Two methods are used to enrich voice analysis for depression detection.
Results suggest that integration of text-based voice classification and learning from low level and graph-based voice signal features can improve the detection of mental disorders like depression.
arXiv Detail & Related papers (2022-05-14T08:37:19Z) - Streaming Multi-talker Speech Recognition with Joint Speaker
Identification [77.46617674133556]
SURIT employs the recurrent neural network transducer (RNN-T) as the backbone for both speech recognition and speaker identification.
We validate our idea on the Librispeech dataset -- a multi-talker dataset derived from Librispeech, and present encouraging results.
arXiv Detail & Related papers (2021-04-05T18:37:33Z) - Discriminative Singular Spectrum Classifier with Applications on
Bioacoustic Signal Recognition [67.4171845020675]
We present a bioacoustic signal classifier equipped with a discriminative mechanism to extract useful features for analysis and classification efficiently.
Unlike current bioacoustic recognition methods, which are task-oriented, the proposed model relies on transforming the input signals into vector subspaces.
The validity of the proposed method is verified using three challenging bioacoustic datasets containing anuran, bee, and mosquito species.
arXiv Detail & Related papers (2021-03-18T11:01:21Z) - An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and
Separation [57.68765353264689]
Speech enhancement and speech separation are two related tasks.
Traditionally, these tasks have been tackled using signal processing and machine learning techniques.
Deep learning has been exploited to achieve strong performance.
arXiv Detail & Related papers (2020-08-21T17:24:09Z) - Data-driven Detection and Analysis of the Patterns of Creaky Voice [13.829936505895692]
Creaky voice is a quality frequently used as a phrase-boundary marker.
The automatic detection and modelling of creaky voice may have implications for speech technology applications.
arXiv Detail & Related papers (2020-05-31T13:34:30Z) - Phase-based Information for Voice Pathology Detection [11.481208551940998]
This paper investigates the potential of using phase-based features for automatically detecting voice disorders.
It is shown that group delay functions are appropriate for characterizing irregularities in the phonation.
arXiv Detail & Related papers (2020-01-02T09:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.