Language-Agnostic Analysis of Speech Depression Detection
- URL: http://arxiv.org/abs/2409.14769v1
- Date: Mon, 23 Sep 2024 07:35:56 GMT
- Title: Language-Agnostic Analysis of Speech Depression Detection
- Authors: Sona Binu, Jismi Jose, Fathima Shimna K V, Alino Luke Hans, Reni K. Cherian, Starlet Ben Alex, Priyanka Srivastava, Chiranjeevi Yarra,
- Abstract summary: This work analyzes automatic speech-based depression detection across two languages, English and Malayalam.
A CNN model is trained to identify acoustic features associated with depression in speech, focusing on both languages.
Our findings and collected data could contribute to the development of language-agnostic speech-based depression detection systems.
- Score: 2.5764071253486636
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The people with Major Depressive Disorder (MDD) exhibit the symptoms of tonal variations in their speech compared to the healthy counterparts. However, these tonal variations not only confine to the state of MDD but also on the language, which has unique tonal patterns. This work analyzes automatic speech-based depression detection across two languages, English and Malayalam, which exhibits distinctive prosodic and phonemic characteristics. We propose an approach that utilizes speech data collected along with self-reported labels from participants reading sentences from IViE corpus, in both English and Malayalam. The IViE corpus consists of five sets of sentences: simple sentences, WH-questions, questions without morphosyntactic markers, inversion questions and coordinations, that can naturally prompt speakers to speak in different tonal patterns. Convolutional Neural Networks (CNNs) are employed for detecting depression from speech. The CNN model is trained to identify acoustic features associated with depression in speech, focusing on both languages. The model's performance is evaluated on the collected dataset containing recordings from both depressed and non-depressed speakers, analyzing its effectiveness in detecting depression across the two languages. Our findings and collected data could contribute to the development of language-agnostic speech-based depression detection systems, thereby enhancing accessibility for diverse populations.
Related papers
- Self-supervised Speech Models for Word-Level Stuttered Speech Detection [66.46810024006712]
We introduce a word-level stuttering speech detection model leveraging self-supervised speech models.
Our evaluation demonstrates that our model surpasses previous approaches in word-level stuttering speech detection.
arXiv Detail & Related papers (2024-09-16T20:18:20Z) - Infusing Acoustic Pause Context into Text-Based Dementia Assessment [7.8642589679025034]
This work investigates the use of pause-enriched transcripts in language models to differentiate the cognitive states of subjects with no cognitive impairment, mild cognitive impairment, and Alzheimer's dementia based on their speech from a clinical assessment.
The performance is evaluated through experiments on a German Verbal Fluency Test and a Picture Description Test, comparing the model's effectiveness across different speech production contexts.
arXiv Detail & Related papers (2024-08-27T16:44:41Z) - Speech-based Clinical Depression Screening: An Empirical Study [32.84863235794086]
This study investigates the utility of speech signals for AI-based depression screening across varied interaction scenarios.
participants include depressed patients recruited from the outpatient clinics of Peking University Sixth Hospital.
We extracted acoustic and deep speech features from each participant's segmented recordings.
arXiv Detail & Related papers (2024-06-05T09:43:54Z) - When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection [17.018248242646365]
Depression is a critical concern in global mental health, prompting extensive research into AI-based detection methods.
Large Language Models (LLMs) stand out for their versatility in mental healthcare applications.
We present an innovative approach to integrating acoustic speech information into the LLMs framework for multimodal depression detection.
arXiv Detail & Related papers (2024-02-17T09:39:46Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Identifying depression-related topics in smartphone-collected
free-response speech recordings using an automatic speech recognition system
and a deep learning topic model [7.825530847570242]
We identified 29 topics in 3919 smartphone-collected speech recordings from 265 participants.
Six topics with a median PHQ-8 greater than or equal to 10 were regarded as risk topics for depression.
The correlation between topic shifts and changes in depression severity over time was also investigated.
arXiv Detail & Related papers (2023-08-22T20:30:59Z) - Semantic Coherence Markers for the Early Diagnosis of the Alzheimer
Disease [0.0]
Perplexity was originally conceived as an information-theoretic measure to assess how much a given language model is suited to predict a text sequence.
We employed language models as diverse as N-grams, from 2-grams to 5-grams, and GPT-2, a transformer-based language model.
Best performing models achieved full accuracy and F-score (1.00 in both precision/specificity and recall/sensitivity) in categorizing subjects from both the AD class and control subjects.
arXiv Detail & Related papers (2023-02-02T11:40:16Z) - Decoding speech perception from non-invasive brain recordings [48.46819575538446]
We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings.
Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
arXiv Detail & Related papers (2022-08-25T10:01:43Z) - Automatic Dialect Density Estimation for African American English [74.44807604000967]
We explore automatic prediction of dialect density of the African American English (AAE) dialect.
dialect density is defined as the percentage of words in an utterance that contain characteristics of the non-standard dialect.
We show a significant correlation between our predicted and ground truth dialect density measures for AAE speech in this database.
arXiv Detail & Related papers (2022-04-03T01:34:48Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - Pragmatic information in translation: a corpus-based study of tense and
mood in English and German [70.3497683558609]
Grammatical tense and mood are important linguistic phenomena to consider in natural language processing (NLP) research.
We consider the correspondence between English and German tense and mood in translation.
Of particular importance is the challenge of modeling tense and mood in rule-based, phrase-based statistical and neural machine translation.
arXiv Detail & Related papers (2020-07-10T08:15:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.