Learning Co-Speech Gesture for Multimodal Aphasia Type Detection
- URL: http://arxiv.org/abs/2310.11710v2
- Date: Fri, 20 Oct 2023 05:43:28 GMT
- Title: Learning Co-Speech Gesture for Multimodal Aphasia Type Detection
- Authors: Daeun Lee, Sejung Son, Hyolim Jeon, Seungbae Kim, Jinyoung Han
- Abstract summary: Aphasia is a language disorder resulting from brain damage.
We propose a graph neural network for aphasia type detection using speech and corresponding gesture patterns.
- Score: 12.164549524639249
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Aphasia, a language disorder resulting from brain damage, requires accurate
identification of specific aphasia types, such as Broca's and Wernicke's
aphasia, for effective treatment. However, little attention has been paid to
developing methods to detect different types of aphasia. Recognizing the
importance of analyzing co-speech gestures for distinguish aphasia types, we
propose a multimodal graph neural network for aphasia type detection using
speech and corresponding gesture patterns. By learning the correlation between
the speech and gesture modalities for each aphasia type, our model can generate
textual representations sensitive to gesture information, leading to accurate
aphasia type detection. Extensive experiments demonstrate the superiority of
our approach over existing methods, achieving state-of-the-art results (F1
84.2\%). We also show that gesture features outperform acoustic features,
highlighting the significance of gesture expression in detecting aphasia types.
We provide the codes for reproducibility purposes.
Related papers
- Infusing Acoustic Pause Context into Text-Based Dementia Assessment [7.8642589679025034]
This work investigates the use of pause-enriched transcripts in language models to differentiate the cognitive states of subjects with no cognitive impairment, mild cognitive impairment, and Alzheimer's dementia based on their speech from a clinical assessment.
The performance is evaluated through experiments on a German Verbal Fluency Test and a Picture Description Test, comparing the model's effectiveness across different speech production contexts.
arXiv Detail & Related papers (2024-08-27T16:44:41Z) - Beyond Binary: Multiclass Paraphasia Detection with Generative Pretrained Transformers and End-to-End Models [10.131053400122308]
Aphasia is a language disorder that can lead to speech errors known as paraphasias.
We present novel approaches that use a generative pretrained transformer (GPT) to identify paraphasias from transcripts.
We demonstrate that a single sequence model outperforms GPT baselines for multiclass paraphasia detection.
arXiv Detail & Related papers (2024-07-16T03:24:51Z) - Impact of Speech Mode in Automatic Pathological Speech Detection [14.011517808456892]
This paper analyzes the influence of speech mode on pathological speech detection approaches.
It examines two categories of approaches, i.e., classical machine learning and deep learning.
Results indicate that classical approaches may struggle to capture pathology-discriminant cues in spontaneous speech.
In contrast, deep learning approaches demonstrate superior performance, managing to extract additional cues that were previously inaccessible in non-spontaneous speech.
arXiv Detail & Related papers (2024-06-14T12:19:18Z) - Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - Seq2seq for Automatic Paraphasia Detection in Aphasic Speech [14.686874756530322]
Paraphasias are speech errors that are characteristic of aphasia and represent an important signal in assessing disease severity and subtype.
Traditionally, clinicians manually identify paraphasias by transcribing and analyzing speech-language samples.
We propose a novel, sequence-to-sequence (seq2seq) model that is trained end-to-end (E2E) to perform both ASR and paraphasia detection tasks.
arXiv Detail & Related papers (2023-12-16T18:22:37Z) - Multimodal Modeling For Spoken Language Identification [57.94119986116947]
Spoken language identification refers to the task of automatically predicting the spoken language in a given utterance.
We propose MuSeLI, a Multimodal Spoken Language Identification method, which delves into the use of various metadata sources to enhance language identification.
arXiv Detail & Related papers (2023-09-19T12:21:39Z) - Dynamic Causal Disentanglement Model for Dialogue Emotion Detection [77.96255121683011]
We propose a Dynamic Causal Disentanglement Model based on hidden variable separation.
This model effectively decomposes the content of dialogues and investigates the temporal accumulation of emotions.
Specifically, we propose a dynamic temporal disentanglement model to infer the propagation of utterances and hidden variables.
arXiv Detail & Related papers (2023-09-13T12:58:09Z) - Careful Whisper -- leveraging advances in automatic speech recognition
for robust and interpretable aphasia subtype classification [0.0]
This paper presents a fully automated approach for identifying speech anomalies from voice recordings to aid in the assessment of speech impairments.
By combining Connectionist Temporal Classification (CTC) and encoder-decoder-based automatic speech recognition models, we generate rich acoustic and clean transcripts.
We then apply several natural language processing methods to extract features from these transcripts to produce prototypes of healthy speech.
arXiv Detail & Related papers (2023-08-02T15:53:59Z) - Towards Intrinsic Common Discriminative Features Learning for Face
Forgery Detection using Adversarial Learning [59.548960057358435]
We propose a novel method which utilizes adversarial learning to eliminate the negative effect of different forgery methods and facial identities.
Our face forgery detection model learns to extract common discriminative features through eliminating the effect of forgery methods and facial identities.
arXiv Detail & Related papers (2022-07-08T09:23:59Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - CogAlign: Learning to Align Textual Neural Representations to Cognitive
Language Processing Signals [60.921888445317705]
We propose a CogAlign approach to integrate cognitive language processing signals into natural language processing models.
We show that CogAlign achieves significant improvements with multiple cognitive features over state-of-the-art models on public datasets.
arXiv Detail & Related papers (2021-06-10T07:10:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.