Impact of Speech Mode in Automatic Pathological Speech Detection
- URL: http://arxiv.org/abs/2406.09968v1
- Date: Fri, 14 Jun 2024 12:19:18 GMT
- Title: Impact of Speech Mode in Automatic Pathological Speech Detection
- Authors: Shakeel A. Sheikh, Ina Kodrasi,
- Abstract summary: This paper analyzes the influence of speech mode on pathological speech detection approaches.
It examines two categories of approaches, i.e., classical machine learning and deep learning.
Results indicate that classical approaches may struggle to capture pathology-discriminant cues in spontaneous speech.
In contrast, deep learning approaches demonstrate superior performance, managing to extract additional cues that were previously inaccessible in non-spontaneous speech.
- Score: 14.011517808456892
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Automatic pathological speech detection approaches yield promising results in identifying various pathologies. These approaches are typically designed and evaluated for phonetically-controlled speech scenarios, where speakers are prompted to articulate identical phonetic content. While gathering controlled speech recordings can be laborious, spontaneous speech can be conveniently acquired as potential patients navigate their daily routines. Further, spontaneous speech can be valuable in detecting subtle and abstract cues of pathological speech. Nonetheless, the efficacy of automatic pathological speech detection for spontaneous speech remains unexplored. This paper analyzes the influence of speech mode on pathological speech detection approaches, examining two distinct categories of approaches, i.e., classical machine learning and deep learning. Results indicate that classical approaches may struggle to capture pathology-discriminant cues in spontaneous speech. In contrast, deep learning approaches demonstrate superior performance, managing to extract additional cues that were previously inaccessible in non-spontaneous speech
Related papers
- Self-supervised Speech Models for Word-Level Stuttered Speech Detection [66.46810024006712]
We introduce a word-level stuttering speech detection model leveraging self-supervised speech models.
Our evaluation demonstrates that our model surpasses previous approaches in word-level stuttering speech detection.
arXiv Detail & Related papers (2024-09-16T20:18:20Z) - Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models [55.898594710420326]
We propose a novel spontaneous speech synthesis system based on language models.
Fine-grained prosody modeling is introduced to enhance the model's ability to capture subtle prosody variations in spontaneous speech.
arXiv Detail & Related papers (2024-07-18T13:42:38Z) - Selfsupervised learning for pathological speech detection [0.0]
Speech production is susceptible to influence and disruption by various neurodegenerative pathological speech disorders.
These disorders lead to pathological speech characterized by abnormal speech patterns and imprecise articulation.
Unlike neurotypical speakers, patients with speech pathologies or impairments are unable to access various virtual assistants such as Alexa, Siri, etc.
arXiv Detail & Related papers (2024-05-16T07:12:47Z) - Towards Spontaneous Style Modeling with Semi-supervised Pre-training for
Conversational Text-to-Speech Synthesis [53.511443791260206]
We propose a semi-supervised pre-training method to increase the amount of spontaneous-style speech and spontaneous behavioral labels.
In the process of semi-supervised learning, both text and speech information are considered for detecting spontaneous behaviors labels in speech.
arXiv Detail & Related papers (2023-08-31T09:50:33Z) - Careful Whisper -- leveraging advances in automatic speech recognition
for robust and interpretable aphasia subtype classification [0.0]
This paper presents a fully automated approach for identifying speech anomalies from voice recordings to aid in the assessment of speech impairments.
By combining Connectionist Temporal Classification (CTC) and encoder-decoder-based automatic speech recognition models, we generate rich acoustic and clean transcripts.
We then apply several natural language processing methods to extract features from these transcripts to produce prototypes of healthy speech.
arXiv Detail & Related papers (2023-08-02T15:53:59Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - Machine Learning for Stuttering Identification: Review, Challenges &
Future Directions [9.726119468893721]
Stuttering is a speech disorder during which the flow of speech is interrupted by involuntary pauses and repetition of sounds.
Recent developments in machine and deep learning have dramatically revolutionized speech domain.
This work fills the gap by trying to bring researchers together from interdisciplinary fields.
arXiv Detail & Related papers (2021-07-08T18:15:20Z) - Comparing Supervised Models And Learned Speech Representations For
Classifying Intelligibility Of Disordered Speech On Selected Phrases [11.3463024120429]
We develop and compare different deep learning techniques to classify the intelligibility of disordered speech on selected phrases.
We collected samples from a diverse set of 661 speakers with a variety of self-reported disorders speaking 29 words or phrases.
arXiv Detail & Related papers (2021-07-08T17:24:25Z) - Multi-view Temporal Alignment for Non-parallel Articulatory-to-Acoustic
Speech Synthesis [59.623780036359655]
Articulatory-to-acoustic (A2A) synthesis refers to the generation of audible speech from captured movement of the speech articulators.
This technique has numerous applications, such as restoring oral communication to people who cannot longer speak due to illness or injury.
We propose a solution to this problem based on the theory of multi-view learning.
arXiv Detail & Related papers (2020-12-30T15:09:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.