Speech Corpus for Korean Children with Autism Spectrum Disorder: Towards
Automatic Assessment Systems
- URL: http://arxiv.org/abs/2402.15539v1
- Date: Fri, 23 Feb 2024 07:32:54 GMT
- Title: Speech Corpus for Korean Children with Autism Spectrum Disorder: Towards
Automatic Assessment Systems
- Authors: Seonwoo Lee, Jihyun Mun, Sunhee Kim, Minhwa Chung
- Abstract summary: This paper introduces a speech corpus specifically designed for Korean children with ASD.
Three speech and language pathologists rated recordings for social communication severity (SCS) and pronunciation proficiency (PP) using a 3-point Likert scale.
The paper also analyzes acoustic and linguistic features extracted from speech data collected and completed for annotation from 73 children with ASD and 9 TD children.
- Score: 7.153773998764661
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the growing demand for digital therapeutics for children with Autism
Spectrum Disorder (ASD), there is currently no speech corpus available for
Korean children with ASD. This paper introduces a speech corpus specifically
designed for Korean children with ASD, aiming to advance speech technologies
such as pronunciation and severity evaluation. Speech recordings from speech
and language evaluation sessions were transcribed, and annotated for
articulatory and linguistic characteristics. Three speech and language
pathologists rated these recordings for social communication severity (SCS) and
pronunciation proficiency (PP) using a 3-point Likert scale. The total number
of participants will be 300 for children with ASD and 50 for typically
developing (TD) children. The paper also analyzes acoustic and linguistic
features extracted from speech data collected and completed for annotation from
73 children with ASD and 9 TD children to investigate the characteristics of
children with ASD and identify significant features that correlate with the
clinical scores. The results reveal some speech and linguistic characteristics
in children with ASD that differ from those in TD children or another subgroup
of ASD categorized by clinical scores, demonstrating the potential for
developing automatic assessment systems for SCS and PP.
Related papers
- Egocentric Speaker Classification in Child-Adult Dyadic Interactions: From Sensing to Computational Modeling [30.099739460287566]
Autism spectrum disorder (ASD) is a neurodevelopmental condition characterized by challenges in social communication, repetitive behavior, and sensory processing.
One important research area in ASD is evaluating children's behavioral changes over time during treatment.
A fundamental aspect of understanding children's behavior in these interactions is automatic speech understanding.
arXiv Detail & Related papers (2024-09-14T07:03:08Z) - Developing an End-to-End Framework for Predicting the Social Communication Severity Scores of Children with Autism Spectrum Disorder [6.197934754799159]
This paper proposes an end-to-end framework for automatically predicting the social communication severity of children with ASD from raw speech data.
Achieving a Pearson Correlation Coefficient of 0.6566 with human-rated scores, the proposed method showcases its potential as an accessible and objective tool for the assessment of ASD.
arXiv Detail & Related papers (2024-08-30T14:43:58Z) - Modality-Order Matters! A Novel Hierarchical Feature Fusion Method for CoSAm: A Code-Switched Autism Corpus [3.06952918690254]
This study introduces a novel hierarchical feature fusion method aimed at enhancing the early detection of ASD in children.
The methodology involves collecting a code-switched speech corpus, CoSAm, from children diagnosed with ASD and a matched control group.
The dataset comprises 61 voice recordings from 30 children diagnosed with ASD and 31 from neurotypical children, aged between 3 and 13 years.
arXiv Detail & Related papers (2024-07-19T14:06:01Z) - An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios [76.11409260727459]
This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system.
We demonstrate that the similarity in phonetics between the pre-training and target languages, as well as the language category, affects the target language's adaptation performance.
arXiv Detail & Related papers (2024-06-13T08:16:52Z) - Exploring Speech Pattern Disorders in Autism using Machine Learning [12.469348589699766]
This study presents a comprehensive approach to identify distinctive speech patterns through the analysis of examiner-patient dialogues.
We extracted 40 speech-related features, categorized into frequency, zero-crossing rate, energy, spectral characteristics, Mel Frequency Cepstral Coefficients (MFCCs) and balance.
The classification model aimed to differentiate between ASD and non-ASD cases, achieving an accuracy of 87.75%.
arXiv Detail & Related papers (2024-05-03T02:59:15Z) - Understanding Spoken Language Development of Children with ASD Using
Pre-trained Speech Embeddings [26.703275678213135]
Natural Language Sample (NLS) analysis has gained attention as a promising complement to traditional methods.
This paper proposes applications of speech processing technologies in support of automated assessment of children's spoken language development.
arXiv Detail & Related papers (2023-05-23T14:39:49Z) - Analysing the Impact of Audio Quality on the Use of Naturalistic
Long-Form Recordings for Infant-Directed Speech Research [62.997667081978825]
Modelling of early language acquisition aims to understand how infants bootstrap their language skills.
Recent developments have enabled the use of more naturalistic training data for computational models.
It is currently unclear how the sound quality could affect analyses and modelling experiments conducted on such data.
arXiv Detail & Related papers (2023-05-03T08:25:37Z) - Decoding speech perception from non-invasive brain recordings [48.46819575538446]
We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings.
Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
arXiv Detail & Related papers (2022-08-25T10:01:43Z) - Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging
Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems.
This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training.
Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - Emotional Voice Conversion: Theory, Databases and ESD [84.62083515557886]
We motivate the development of a novel emotional speech database ( ESD)
The ESD database consists of 350 parallel utterances spoken by 10 native English and 10 native Chinese speakers.
The database is suitable for multi-speaker and cross-lingual emotional voice conversion studies.
arXiv Detail & Related papers (2021-05-31T07:48:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.