Modality-Order Matters! A Novel Hierarchical Feature Fusion Method for CoSAm: A Code-Switched Autism Corpus
- URL: http://arxiv.org/abs/2407.14328v2
- Date: Tue, 23 Jul 2024 11:56:22 GMT
- Title: Modality-Order Matters! A Novel Hierarchical Feature Fusion Method for CoSAm: A Code-Switched Autism Corpus
- Authors: Mohd Mujtaba Akhtar, Girish, Muskaan Singh, Orchid Chetia Phukan,
- Abstract summary: This study introduces a novel hierarchical feature fusion method aimed at enhancing the early detection of ASD in children.
The methodology involves collecting a code-switched speech corpus, CoSAm, from children diagnosed with ASD and a matched control group.
The dataset comprises 61 voice recordings from 30 children diagnosed with ASD and 31 from neurotypical children, aged between 3 and 13 years.
- Score: 3.06952918690254
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autism Spectrum Disorder (ASD) is a complex neuro-developmental challenge, presenting a spectrum of difficulties in social interaction, communication, and the expression of repetitive behaviors in different situations. This increasing prevalence underscores the importance of ASD as a major public health concern and the need for comprehensive research initiatives to advance our understanding of the disorder and its early detection methods. This study introduces a novel hierarchical feature fusion method aimed at enhancing the early detection of ASD in children through the analysis of code-switched speech (English and Hindi). Employing advanced audio processing techniques, the research integrates acoustic, paralinguistic, and linguistic information using Transformer Encoders. This innovative fusion strategy is designed to improve classification robustness and accuracy, crucial for early and precise ASD identification. The methodology involves collecting a code-switched speech corpus, CoSAm, from children diagnosed with ASD and a matched control group. The dataset comprises 61 voice recordings from 30 children diagnosed with ASD and 31 from neurotypical children, aged between 3 and 13 years, resulting in a total of 159.75 minutes of voice recordings. The feature analysis focuses on MFCCs and extensive statistical attributes to capture speech pattern variability and complexity. The best model performance is achieved using a hierarchical fusion technique with an accuracy of 98.75% using a combination of acoustic and linguistic features first, followed by paralinguistic features in a hierarchical manner.
Related papers
- Exploring Speech Pattern Disorders in Autism using Machine Learning [12.469348589699766]
This study presents a comprehensive approach to identify distinctive speech patterns through the analysis of examiner-patient dialogues.
We extracted 40 speech-related features, categorized into frequency, zero-crossing rate, energy, spectral characteristics, Mel Frequency Cepstral Coefficients (MFCCs) and balance.
The classification model aimed to differentiate between ASD and non-ASD cases, achieving an accuracy of 87.75%.
arXiv Detail & Related papers (2024-05-03T02:59:15Z) - MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition [62.89464258519723]
We propose a multi-layer cross-attention fusion based AVSR approach that promotes representation of each modality by fusing them at different levels of audio/visual encoders.
Our proposed approach surpasses the first-place system, establishing a new SOTA cpCER of 29.13% on this dataset.
arXiv Detail & Related papers (2024-01-07T08:59:32Z) - Early Autism Diagnosis based on Path Signature and Siamese Unsupervised Feature Compressor [15.39635888144281]
We resort to a novel deep learning-based method to extract key features from the inherently scarce, class-imbalanced, and heterogeneous structural MR images for early autism diagnosis.
Specifically, we propose a Siamese verification framework to extend the scarce data, and an unsupervised compressor to alleviate data imbalance.
arXiv Detail & Related papers (2023-07-12T22:08:22Z) - MMASD: A Multimodal Dataset for Autism Intervention Analysis [2.0731167087748994]
This work presents a novel privacy-preserving open-source dataset, MMASD as a MultiModal ASD benchmark dataset.
MMASD includes data from 32 children with ASD, and 1,315 data samples segmented from over 100 hours of intervention recordings.
MMASD aims to assist researchers and therapists in understanding children's cognitive status, monitoring their progress during therapy, and customizing the treatment plan accordingly.
arXiv Detail & Related papers (2023-06-14T05:04:11Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - Comparison of Probabilistic Deep Learning Methods for Autism Detection [0.0]
Autism Spectrum Disorder (ASD) is one neuro developmental disorder that is now widespread in the world.
Early detection of the disorder helps in the onset treatment and helps one to lead a normal life.
arXiv Detail & Related papers (2023-03-09T17:49:37Z) - Exploring linguistic feature and model combination for speech
recognition based automatic AD detection [61.91708957996086]
Speech based automatic AD screening systems provide a non-intrusive and more scalable alternative to other clinical screening techniques.
Scarcity of specialist data leads to uncertainty in both model selection and feature learning when developing such systems.
This paper investigates the use of feature and model combination approaches to improve the robustness of domain fine-tuning of BERT and Roberta pre-trained text encoders.
arXiv Detail & Related papers (2022-06-28T05:09:01Z) - Conformer Based Elderly Speech Recognition System for Alzheimer's
Disease Detection [62.23830810096617]
Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care to delay further progression.
This paper presents the development of a state-of-the-art Conformer based speech recognition system built on the DementiaBank Pitt corpus for automatic AD detection.
arXiv Detail & Related papers (2022-06-23T12:50:55Z) - Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging
Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems.
This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training.
Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z) - Hybrid Attention for Automatic Segmentation of Whole Fetal Head in
Prenatal Ultrasound Volumes [52.53375964591765]
We propose the first fully-automated solution to segment the whole fetal head in US volumes.
The segmentation task is firstly formulated as an end-to-end volumetric mapping under an encoder-decoder deep architecture.
We then combine the segmentor with a proposed hybrid attention scheme (HAS) to select discriminative features and suppress the non-informative volumetric features.
arXiv Detail & Related papers (2020-04-28T14:43:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.