Speech Detection For Child-Clinician Conversations In Danish For
Low-Resource In-The-Wild Conditions: A Case Study
- URL: http://arxiv.org/abs/2204.11550v1
- Date: Mon, 25 Apr 2022 10:51:54 GMT
- Title: Speech Detection For Child-Clinician Conversations In Danish For
Low-Resource In-The-Wild Conditions: A Case Study
- Authors: Sneha Das, Nicole Nadine L{\o}nfeldt, Anne Katrine Pagsberg, Line. H.
Clemmensen
- Abstract summary: We study the performance of a pre-trained speech model on a dataset comprising of child-clinician conversations in Danish.
We learned that the model with default classification threshold performs worse on children from the patient group.
Our study on few-instance adaptation shows that three-minutes of clinician-child conversation is sufficient to obtain the optimum classification threshold.
- Score: 6.4461798613033405
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Use of speech models for automatic speech processing tasks can improve
efficiency in the screening, analysis, diagnosis and treatment in medicine and
psychiatry. However, the performance of pre-processing speech tasks like
segmentation and diarization can drop considerably on in-the-wild clinical
data, specifically when the target dataset comprises of atypical speech. In
this paper we study the performance of a pre-trained speech model on a dataset
comprising of child-clinician conversations in Danish with respect to the
classification threshold. Since we do not have access to sufficient labelled
data, we propose few-instance threshold adaptation, wherein we employ the first
minutes of the speech conversation to obtain the optimum classification
threshold. Through our work in this paper, we learned that the model with
default classification threshold performs worse on children from the patient
group. Furthermore, the error rates of the model is directly correlated to the
severity of diagnosis in the patients. Lastly, our study on few-instance
adaptation shows that three-minutes of clinician-child conversation is
sufficient to obtain the optimum classification threshold.
Related papers
- Self-supervised Speech Models for Word-Level Stuttered Speech Detection [66.46810024006712]
We introduce a word-level stuttering speech detection model leveraging self-supervised speech models.
Our evaluation demonstrates that our model surpasses previous approaches in word-level stuttering speech detection.
arXiv Detail & Related papers (2024-09-16T20:18:20Z) - Towards objective and interpretable speech disorder assessment: a comparative analysis of CNN and transformer-based models [7.774205081900019]
Head and Neck Cancers (HNC) significantly impact patients' ability to speak, affecting their quality of life.
This study proposes a self-supervised Wav2Vec2-based model for phone classification with HNC patients, to enhance accuracy and improve the discrimination of phonetic features for subsequent interpretability purpose.
arXiv Detail & Related papers (2024-06-07T08:51:52Z) - Identification of Cognitive Decline from Spoken Language through Feature
Selection and the Bag of Acoustic Words Model [0.0]
The early identification of symptoms of memory disorders plays a significant role in ensuring the well-being of populations.
The lack of standardized speech tests in clinical settings has led to a growing emphasis on developing automatic machine learning techniques for analyzing naturally spoken language.
The work presents an approach related to feature selection, allowing for the automatic selection of the essential features required for diagnosis from the Geneva minimalistic acoustic parameter set and relative speech pauses.
arXiv Detail & Related papers (2024-02-02T17:06:03Z) - A study on the impact of Self-Supervised Learning on automatic dysarthric speech assessment [6.284142286798582]
We show that HuBERT is the most versatile feature extractor across dysarthria classification, word recognition, and intelligibility classification, achieving respectively $+24.7%, +61%, textand +7.2%$ accuracy compared to classical acoustic features.
arXiv Detail & Related papers (2023-06-07T11:04:02Z) - Analysing the Impact of Audio Quality on the Use of Naturalistic
Long-Form Recordings for Infant-Directed Speech Research [62.997667081978825]
Modelling of early language acquisition aims to understand how infants bootstrap their language skills.
Recent developments have enabled the use of more naturalistic training data for computational models.
It is currently unclear how the sound quality could affect analyses and modelling experiments conducted on such data.
arXiv Detail & Related papers (2023-05-03T08:25:37Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - The Far Side of Failure: Investigating the Impact of Speech Recognition
Errors on Subsequent Dementia Classification [8.032686410648274]
Linguistic anomalies detectable in spontaneous speech have shown promise for various clinical applications including screening for dementia and other forms of cognitive impairment.
The impressive performance of self-supervised learning (SSL) automatic speech recognition (ASR) models with curated speech data is not apparent with challenging speech samples from clinical settings.
One of our key findings is that, paradoxically, ASR systems with relatively high error rates can produce transcripts that result in better downstream classification accuracy than classification based on verbatim transcripts.
arXiv Detail & Related papers (2022-11-11T17:06:45Z) - Multi-class versus One-class classifier in spontaneous speech analysis
oriented to Alzheimer Disease diagnosis [58.720142291102135]
The aim of our project is to contribute to earlier diagnosis of AD and better estimates of its severity by using automatic analysis performed through new biomarkers extracted from speech signal.
The use of information about outlier and Fractal Dimension features improves the system performance.
arXiv Detail & Related papers (2022-03-21T09:57:20Z) - Self-Normalized Importance Sampling for Neural Language Modeling [97.96857871187052]
In this work, we propose self-normalized importance sampling. Compared to our previous work, the criteria considered in this work are self-normalized and there is no need to further conduct a correction step.
We show that our proposed self-normalized importance sampling is competitive in both research-oriented and production-oriented automatic speech recognition tasks.
arXiv Detail & Related papers (2021-11-11T16:57:53Z) - Comparison of Speaker Role Recognition and Speaker Enrollment Protocol
for conversational Clinical Interviews [9.728371067160941]
We train end-to-end neural network architectures to adapt to each task and evaluate each approach under the same metric.
Results do not depend on the demographics of the Interviewee, highlighting the clinical relevance of our methods.
arXiv Detail & Related papers (2020-10-30T09:07:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.