Comparison of Speaker Role Recognition and Speaker Enrollment Protocol
for conversational Clinical Interviews
- URL: http://arxiv.org/abs/2010.16131v2
- Date: Thu, 5 Nov 2020 08:09:11 GMT
- Title: Comparison of Speaker Role Recognition and Speaker Enrollment Protocol
for conversational Clinical Interviews
- Authors: Rachid Riad and Hadrien Titeux and Laurie Lemoine and Justine
Montillot and Agnes Sliwinski and Jennifer Hamet Bagnou and Xuan Nga Cao and
Anne-Catherine Bachoud-L\'evi and Emmanuel Dupoux
- Abstract summary: We train end-to-end neural network architectures to adapt to each task and evaluate each approach under the same metric.
Results do not depend on the demographics of the Interviewee, highlighting the clinical relevance of our methods.
- Score: 9.728371067160941
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Conversations between a clinician and a patient, in natural conditions, are
valuable sources of information for medical follow-up. The automatic analysis
of these dialogues could help extract new language markers and speed-up the
clinicians' reports. Yet, it is not clear which speech processing pipeline is
the most performing to detect and identify the speaker turns, especially for
individuals with speech and language disorders. Here, we proposed a split of
the data that allows conducting a comparative evaluation of speaker role
recognition and speaker enrollment methods to solve this task. We trained
end-to-end neural network architectures to adapt to each task and evaluate each
approach under the same metric. Experimental results are reported on
naturalistic clinical conversations between Neuropsychologist and Interviewees,
at different stages of Huntington's disease. We found that our Speaker Role
Recognition model gave the best performances. In addition, our study underlined
the importance of retraining models with in-domain data. Finally, we observed
that results do not depend on the demographics of the Interviewee, highlighting
the clinical relevance of our methods.
Related papers
- Self-supervised Speech Models for Word-Level Stuttered Speech Detection [66.46810024006712]
We introduce a word-level stuttering speech detection model leveraging self-supervised speech models.
Our evaluation demonstrates that our model surpasses previous approaches in word-level stuttering speech detection.
arXiv Detail & Related papers (2024-09-16T20:18:20Z) - A Comparative Study on Patient Language across Therapeutic Domains for Effective Patient Voice Classification in Online Health Discussions [0.48124799513933847]
In this study, we analyse the importance of linguistic characteristics in accurately classifying patient voices.
We fine-tuned a pre-trained Language Model on the combined datasets with similar linguistic patterns, resulting in a highly accurate automatic patient voice classification.
Being the pioneering study on the topic, our focus on extracting authentic patient experiences from social media stands as a crucial step towards advancing healthcare standards.
arXiv Detail & Related papers (2024-07-23T15:51:46Z) - Speech-based Clinical Depression Screening: An Empirical Study [32.84863235794086]
This study investigates the utility of speech signals for AI-based depression screening across varied interaction scenarios.
participants include depressed patients recruited from the outpatient clinics of Peking University Sixth Hospital.
We extracted acoustic and deep speech features from each participant's segmented recordings.
arXiv Detail & Related papers (2024-06-05T09:43:54Z) - Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [53.629132242389716]
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions.
VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information.
We propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
arXiv Detail & Related papers (2024-05-29T23:19:28Z) - Identification of Cognitive Decline from Spoken Language through Feature
Selection and the Bag of Acoustic Words Model [0.0]
The early identification of symptoms of memory disorders plays a significant role in ensuring the well-being of populations.
The lack of standardized speech tests in clinical settings has led to a growing emphasis on developing automatic machine learning techniques for analyzing naturally spoken language.
The work presents an approach related to feature selection, allowing for the automatic selection of the essential features required for diagnosis from the Geneva minimalistic acoustic parameter set and relative speech pauses.
arXiv Detail & Related papers (2024-02-02T17:06:03Z) - Show from Tell: Audio-Visual Modelling in Clinical Settings [58.88175583465277]
We consider audio-visual modelling in a clinical setting, providing a solution to learn medical representations without human expert annotation.
A simple yet effective multi-modal self-supervised learning framework is proposed for this purpose.
The proposed approach is able to localise anatomical regions of interest during ultrasound imaging, with only speech audio as a reference.
arXiv Detail & Related papers (2023-10-25T08:55:48Z) - A New Benchmark of Aphasia Speech Recognition and Detection Based on
E-Branchformer and Multi-task Learning [29.916793641951507]
This paper presents a new benchmark for Aphasia speech recognition using state-of-the-art speech recognition techniques.
We introduce two multi-task learning methods based on the CTC/Attention architecture to perform both tasks simultaneously.
Our system achieves state-of-the-art speaker-level detection accuracy (97.3%), and a relative WER reduction of 11% for moderate Aphasia patients.
arXiv Detail & Related papers (2023-05-19T15:10:36Z) - Domain-specific Language Pre-training for Dialogue Comprehension on
Clinical Inquiry-Answering Conversations [28.567701055153385]
Recent developments in natural language processing suggest that large-scale pre-trained language backbones could be leveraged for machine comprehension and information extraction tasks.
Yet, due to the gap between pre-training and downstream clinical domains, it remains challenging to exploit the generic backbones for domain-specific applications.
We propose a domain-specific language pre-training, to improve performance on downstream tasks like dialogue comprehension.
arXiv Detail & Related papers (2022-06-06T08:45:03Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - Speaker Diarization with Lexical Information [59.983797884955]
This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition.
We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy.
arXiv Detail & Related papers (2020-04-13T17:16:56Z) - Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention [70.82604384963679]
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features.
We extract a speaker representation used for adaptation directly from the test utterance.
arXiv Detail & Related papers (2020-02-14T05:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.