Automatically measuring speech fluency in people with aphasia: first
achievements using read-speech data
- URL: http://arxiv.org/abs/2308.04763v1
- Date: Wed, 9 Aug 2023 07:51:40 GMT
- Title: Automatically measuring speech fluency in people with aphasia: first
achievements using read-speech data
- Authors: Lionel Fontan, Typhanie Prince (Praxiling, LNPL), Aleksandra
Nowakowska (Praxiling), Halima Sahraoui (LNPL), Silvia Martinez-Ferreiro
- Abstract summary: This study aims at assessing the relevance of a signalprocessingalgorithm, initially developed in the field of language acquisition, for the automatic measurement of speech fluency.
- Score: 55.84746218227712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Background: Speech and language pathologists (SLPs) often relyon judgements
of speech fluency for diagnosing or monitoringpatients with aphasia. However,
such subjective methods havebeen criticised for their lack of reliability and
their clinical cost interms of time. Aims: This study aims at assessing the
relevance of a signalprocessingalgorithm, initially developed in the field of
language acquisition, for the automatic measurement of speech fluency in people
with aphasia (PWA). Methods & Procedures: Twenty-nine PWA and five control
participantswere recruited via non-profit organizations and SLP networks. All
participants were recorded while reading out loud a set ofsentences taken from
the French version of the Boston Diagnostic Aphasia Examination. Three trained
SLPs assessed the fluency of each sentence on a five-point qualitative scale. A
forward-backward divergence segmentation and a clustering algorithm were used
to compute, for each sentence, four automatic predictors of speech fluency:
pseudo-syllable rate, speech ratio, rate of silent breaks, and standard
deviation of pseudo-syllable length. The four predictors were finally combined
into multivariate regression models (a multiplelinear regression - MLR, and two
non-linear models) to predict the average SLP ratings of speech fluency, using
a leave-one speaker-out validation scheme. Outcomes & Results: All models
achieved accurate predictions of speech fluency ratings, with average
root-mean-square errors as low as 0.5. The MLR yielded a correlation
coefficient of 0.87 with reference ratings at the sentence level, and of 0.93
when aggregating the data for each participant. The inclusion of an additional
predictor sensitive to repetitions improved further the predictions with a
correlation coefficient of 0.91 at the sentence level, and of 0.96 at the
participant level. Conclusions: The algorithms used in this study can
constitute a cost-effective and reliable tool for the assessment of the speech
fluency of patients with aphasia in read-aloud tasks. Perspectives for the
assessment of spontaneous speech are discussed.
Related papers
- Detecting Speech Abnormalities with a Perceiver-based Sequence
Classifier that Leverages a Universal Speech Model [4.503292461488901]
We propose a Perceiver-based sequence to detect abnormalities in speech reflective of several neurological disorders.
We combine this sequence with a Universal Speech Model (USM) that is trained (unsupervised) on 12 million hours of diverse audio recordings.
Our model outperforms standard transformer (80.9%) and perceiver (81.8%) models and achieves an average accuracy of 83.1%.
arXiv Detail & Related papers (2023-10-16T21:07:12Z) - Phonetic and Prosody-aware Self-supervised Learning Approach for
Non-native Fluency Scoring [13.817385516193445]
Speech fluency/disfluency can be evaluated by analyzing a range of phonetic and prosodic features.
Deep neural networks are commonly trained to map fluency-related features into the human scores.
We introduce a self-supervised learning (SSL) approach that takes into account phonetic and prosody awareness for fluency scoring.
arXiv Detail & Related papers (2023-05-19T05:39:41Z) - Ontology-aware Learning and Evaluation for Audio Tagging [56.59107110017436]
Mean average precision (mAP) metric treats different kinds of sound as independent classes without considering their relations.
Ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation.
We conduct human evaluations and demonstrate that OmAP is more consistent with human perception than mAP.
arXiv Detail & Related papers (2022-11-22T11:35:14Z) - Disentangled Latent Speech Representation for Automatic Pathological
Intelligibility Assessment [10.93598143328628]
We show that disentangled speech representations can be used for automatic pathological speech intelligibility assessment.
Our results are among the first to show that disentangled speech representations can be used for automatic pathological speech intelligibility assessment.
arXiv Detail & Related papers (2022-04-08T12:02:14Z) - On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and
Elderly Speech Recognition [53.17176024917725]
Scarcity of speaker-level data limits the practical use of data-intensive model based speaker adaptation methods.
This paper proposes two novel forms of data-efficient, feature-based on-the-fly speaker adaptation methods.
arXiv Detail & Related papers (2022-03-28T09:12:24Z) - Prediction of Depression Severity Based on the Prosodic and Semantic
Features with Bidirectional LSTM and Time Distributed CNN [14.994852548758825]
We propose an attention-based multimodality speech and text representation for depression prediction.
Our model is trained to estimate the depression severity of participants using the Distress Analysis Interview Corpus-Wizard of Oz dataset.
Experiments show statistically significant improvements over previous works.
arXiv Detail & Related papers (2022-02-25T01:42:29Z) - Continuous Speech for Improved Learning Pathological Voice Disorders [12.867900671251395]
This study proposes a novel approach, using continuous Mandarin speech instead of a single vowel, to classify four common voice disorders.
In the proposed framework, acoustic signals are transformed into mel-frequency cepstral coefficients, and a bi-directional long-short term memory network (BiLSTM) is adopted to model the sequential features.
arXiv Detail & Related papers (2022-02-22T09:58:31Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - Self-Normalized Importance Sampling for Neural Language Modeling [97.96857871187052]
In this work, we propose self-normalized importance sampling. Compared to our previous work, the criteria considered in this work are self-normalized and there is no need to further conduct a correction step.
We show that our proposed self-normalized importance sampling is competitive in both research-oriented and production-oriented automatic speech recognition tasks.
arXiv Detail & Related papers (2021-11-11T16:57:53Z) - NUVA: A Naming Utterance Verifier for Aphasia Treatment [49.114436579008476]
Assessment of speech performance using picture naming tasks is a key method for both diagnosis and monitoring of responses to treatment interventions by people with aphasia (PWA)
Here we present NUVA, an utterance verification system incorporating a deep learning element that classifies 'correct' versus'incorrect' naming attempts from aphasic stroke patients.
When tested on eight native British-English speaking PWA the system's performance accuracy ranged between 83.6% to 93.6%, with a 10-fold cross-validation mean of 89.5%.
arXiv Detail & Related papers (2021-02-10T13:00:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.