Toward Knowledge-Driven Speech-Based Models of Depression: Leveraging
Spectrotemporal Variations in Speech Vowels
- URL: http://arxiv.org/abs/2210.02527v1
- Date: Wed, 5 Oct 2022 19:57:53 GMT
- Title: Toward Knowledge-Driven Speech-Based Models of Depression: Leveraging
Spectrotemporal Variations in Speech Vowels
- Authors: Kexin Feng and Theodora Chaspari
- Abstract summary: Psychomotor retardation associated with depression has been linked with tangible differences in vowel production.
This paper investigates a knowledge-driven machine learning (ML) method that integrates spectrotemporal information of speech at the vowel-level to identify the depression.
- Score: 10.961439164833891
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Psychomotor retardation associated with depression has been linked with
tangible differences in vowel production. This paper investigates a
knowledge-driven machine learning (ML) method that integrates spectrotemporal
information of speech at the vowel-level to identify the depression. Low-level
speech descriptors are learned by a convolutional neural network (CNN) that is
trained for vowel classification. The temporal evolution of those low-level
descriptors is modeled at the high-level within and across utterances via a
long short-term memory (LSTM) model that takes the final depression decision. A
modified version of the Local Interpretable Model-agnostic Explanations (LIME)
is further used to identify the impact of the low-level spectrotemporal vowel
variation on the decisions and observe the high-level temporal change of the
depression likelihood. The proposed method outperforms baselines that model the
spectrotemporal information in speech without integrating the vowel-based
information, as well as ML models trained with conventional prosodic and
spectrotemporal features. The conducted explainability analysis indicates that
spectrotemporal information corresponding to non-vowel segments less important
than the vowel-based information. Explainability of the high-level information
capturing the segment-by-segment decisions is further inspected for
participants with and without depression. The findings from this work can
provide the foundation toward knowledge-driven interpretable decision-support
systems that can assist clinicians to better understand fine-grain temporal
changes in speech data, ultimately augmenting mental health diagnosis and care.
Related papers
- A Methodology for Explainable Large Language Models with Integrated Gradients and Linguistic Analysis in Text Classification [2.556395214262035]
Neurological disorders that affect speech production, such as Alzheimer's Disease (AD), significantly impact the lives of both patients and caregivers.
Recent advancements in Large Language Model (LLM) architectures have developed many tools to identify representative features of neurological disorders through spontaneous speech.
This paper presents an explainable LLM method, named SLIME, capable of identifying lexical components representative of AD.
arXiv Detail & Related papers (2024-09-30T21:45:02Z) - Selfsupervised learning for pathological speech detection [0.0]
Speech production is susceptible to influence and disruption by various neurodegenerative pathological speech disorders.
These disorders lead to pathological speech characterized by abnormal speech patterns and imprecise articulation.
Unlike neurotypical speakers, patients with speech pathologies or impairments are unable to access various virtual assistants such as Alexa, Siri, etc.
arXiv Detail & Related papers (2024-05-16T07:12:47Z) - Exploring Speech Pattern Disorders in Autism using Machine Learning [12.469348589699766]
This study presents a comprehensive approach to identify distinctive speech patterns through the analysis of examiner-patient dialogues.
We extracted 40 speech-related features, categorized into frequency, zero-crossing rate, energy, spectral characteristics, Mel Frequency Cepstral Coefficients (MFCCs) and balance.
The classification model aimed to differentiate between ASD and non-ASD cases, achieving an accuracy of 87.75%.
arXiv Detail & Related papers (2024-05-03T02:59:15Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - NeuroExplainer: Fine-Grained Attention Decoding to Uncover Cortical
Development Patterns of Preterm Infants [73.85768093666582]
We propose an explainable geometric deep network dubbed NeuroExplainer.
NeuroExplainer is used to uncover altered infant cortical development patterns associated with preterm birth.
arXiv Detail & Related papers (2023-01-01T12:48:12Z) - A knowledge-driven vowel-based approach of depression classification
from speech using data augmentation [10.961439164833891]
We propose a novel explainable machine learning (ML) model that identifies depression from speech.
Our method first models the variable-length utterances at the local-level into a fixed-size vowel-based embedding.
depression is classified at the global-level from a group of vowel CNN embeddings that serve as the input of another 1D CNN.
arXiv Detail & Related papers (2022-10-27T08:34:08Z) - Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric
and Elderly Speech Recognition [48.33873602050463]
Speaker adaptation techniques play a key role in personalization of ASR systems for such users.
Motivated by the spectro-temporal level differences between dysarthric, elderly and normal speech.
Novel spectrotemporal subspace basis deep embedding features derived using SVD speech spectrum.
arXiv Detail & Related papers (2022-02-21T15:11:36Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - Spectro-Temporal Deep Features for Disordered Speech Assessment and
Recognition [65.25325641528701]
Motivated by the spectro-temporal level differences between disordered and normal speech that systematically manifest in articulatory imprecision, decreased volume and clarity, slower speaking rates and increased dysfluencies, novel spectro-temporal subspace basis embedding deep features derived by SVD decomposition of speech spectrum are proposed.
Experiments conducted on the UASpeech corpus suggest the proposed spectro-temporal deep feature adapted systems consistently outperformed baseline i- adaptation by up to 263% absolute (8.6% relative) reduction in word error rate (WER) with or without data augmentation.
arXiv Detail & Related papers (2022-01-14T16:56:43Z) - Identification of Dementia Using Audio Biomarkers [15.740689461116762]
The objective of this work is to use speech processing and machine learning techniques to automatically identify the stage of dementia.
Non-linguistic acoustic parameters are used for this purpose, making this a language independent approach.
We analyze the contribution of various types of acoustic features such as spectral, temporal, cepstral their feature-level fusion and selection towards the identification of dementia stage.
arXiv Detail & Related papers (2020-02-27T13:54:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.