Integration of Text and Graph-based Features for Detecting Mental Health
Disorders from Voice
- URL: http://arxiv.org/abs/2205.07006v1
- Date: Sat, 14 May 2022 08:37:19 GMT
- Title: Integration of Text and Graph-based Features for Detecting Mental Health
Disorders from Voice
- Authors: Nasser Ghadiri, Rasoul Samani, Fahime Shahrokh
- Abstract summary: Two methods are used to enrich voice analysis for depression detection.
Results suggest that integration of text-based voice classification and learning from low level and graph-based voice signal features can improve the detection of mental disorders like depression.
- Score: 1.5469452301122175
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: With the availability of voice-enabled devices such as smart phones, mental
health disorders could be detected and treated earlier, particularly
post-pandemic. The current methods involve extracting features directly from
audio signals. In this paper, two methods are used to enrich voice analysis for
depression detection: graph transformation of voice signals, and natural
language processing of the transcript based on representational learning, fused
together to produce final class labels. The results of experiments with the
DAIC-WOZ dataset suggest that integration of text-based voice classification
and learning from low level and graph-based voice signal features can improve
the detection of mental disorders like depression.
Related papers
- Language-Agnostic Analysis of Speech Depression Detection [2.5764071253486636]
This work analyzes automatic speech-based depression detection across two languages, English and Malayalam.
A CNN model is trained to identify acoustic features associated with depression in speech, focusing on both languages.
Our findings and collected data could contribute to the development of language-agnostic speech-based depression detection systems.
arXiv Detail & Related papers (2024-09-23T07:35:56Z) - A Novel Labeled Human Voice Signal Dataset for Misbehavior Detection [0.7223352886780369]
This research highlights the significance of voice tone and delivery in automated machine-learning systems for voice analysis and recognition.
It contributes to the broader field of voice signal analysis by elucidating the impact of human behaviour on the perception and categorization of voice signals.
arXiv Detail & Related papers (2024-06-28T18:55:07Z) - When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection [17.018248242646365]
Depression is a critical concern in global mental health, prompting extensive research into AI-based detection methods.
Large Language Models (LLMs) stand out for their versatility in mental healthcare applications.
We present an innovative approach to integrating acoustic speech information into the LLMs framework for multimodal depression detection.
arXiv Detail & Related papers (2024-02-17T09:39:46Z) - EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech
Resynthesis [49.04496602282718]
We introduce Expresso, a high-quality expressive speech dataset for textless speech synthesis.
This dataset includes both read speech and improvised dialogues rendered in 26 spontaneous expressive styles.
We evaluate resynthesis quality with automatic metrics for different self-supervised discrete encoders.
arXiv Detail & Related papers (2023-08-10T17:41:19Z) - Language-Guided Audio-Visual Source Separation via Trimodal Consistency [64.0580750128049]
A key challenge in this task is learning to associate the linguistic description of a sound-emitting object to its visual features and the corresponding components of the audio waveform.
We adapt off-the-shelf vision-language foundation models to provide pseudo-target supervision via two novel loss functions.
We demonstrate the effectiveness of our self-supervised approach on three audio-visual separation datasets.
arXiv Detail & Related papers (2023-03-28T22:45:40Z) - Inner speech recognition through electroencephalographic signals [2.578242050187029]
This work focuses on inner speech recognition starting from EEG signals.
The decoding of the EEG into text should be understood as the classification of a limited number of words (commands)
Speech-related BCIs provide effective vocal communication strategies for controlling devices through speech commands interpreted from brain signals.
arXiv Detail & Related papers (2022-10-11T08:29:12Z) - Speaker Embedding-aware Neural Diarization for Flexible Number of
Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels.
Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z) - CogAlign: Learning to Align Textual Neural Representations to Cognitive
Language Processing Signals [60.921888445317705]
We propose a CogAlign approach to integrate cognitive language processing signals into natural language processing models.
We show that CogAlign achieves significant improvements with multiple cognitive features over state-of-the-art models on public datasets.
arXiv Detail & Related papers (2021-06-10T07:10:25Z) - An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and
Separation [57.68765353264689]
Speech enhancement and speech separation are two related tasks.
Traditionally, these tasks have been tackled using signal processing and machine learning techniques.
Deep learning has been exploited to achieve strong performance.
arXiv Detail & Related papers (2020-08-21T17:24:09Z) - Continuous speech separation: dataset and analysis [52.10378896407332]
In natural conversations, a speech signal is continuous, containing both overlapped and overlap-free components.
This paper describes a dataset and protocols for evaluating continuous speech separation algorithms.
arXiv Detail & Related papers (2020-01-30T18:01:31Z) - On the Mutual Information between Source and Filter Contributions for
Voice Pathology Detection [11.481208551940998]
This paper addresses the problem of automatic detection of voice pathologies directly from the speech signal.
Three sets of features are proposed, depending on whether they are related to the speech or the glottal signal, or to prosody.
arXiv Detail & Related papers (2020-01-02T10:04:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.