Speech motion anomaly detection via cross-modal translation of 4D motion
fields from tagged MRI
- URL: http://arxiv.org/abs/2402.06984v1
- Date: Sat, 10 Feb 2024 16:16:24 GMT
- Title: Speech motion anomaly detection via cross-modal translation of 4D motion
fields from tagged MRI
- Authors: Xiaofeng Liu, Fangxu Xing, Jiachen Zhuo, Maureen Stone, Jerry L.
Prince, Georges El Fakhri, Jonghye Woo
- Abstract summary: We aim to develop a framework for detecting speech motion anomalies in conjunction with their corresponding speech acoustics.
This is achieved through the use of a deep cross-modal translator trained on data from healthy individuals only.
A one-class SVM is then used to distinguish the spectrograms of healthy individuals from those of patients.
- Score: 12.515470808059666
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding the relationship between tongue motion patterns during speech
and their resulting speech acoustic outcomes -- i.e., articulatory-acoustic
relation -- is of great importance in assessing speech quality and developing
innovative treatment and rehabilitative strategies. This is especially
important when evaluating and detecting abnormal articulatory features in
patients with speech-related disorders. In this work, we aim to develop a
framework for detecting speech motion anomalies in conjunction with their
corresponding speech acoustics. This is achieved through the use of a deep
cross-modal translator trained on data from healthy individuals only, which
bridges the gap between 4D motion fields obtained from tagged MRI and 2D
spectrograms derived from speech acoustic data. The trained translator is used
as an anomaly detector, by measuring the spectrogram reconstruction quality on
healthy individuals or patients. In particular, the cross-modal translator is
likely to yield limited generalization capabilities on patient data, which
includes unseen out-of-distribution patterns and demonstrates subpar
performance, when compared with healthy individuals.~A one-class SVM is then
used to distinguish the spectrograms of healthy individuals from those of
patients. To validate our framework, we collected a total of 39 paired tagged
MRI and speech waveforms, consisting of data from 36 healthy individuals and 3
tongue cancer patients. We used both 3D convolutional and transformer-based
deep translation models, training them on the healthy training set and then
applying them to both the healthy and patient testing sets. Our framework
demonstrates a capability to detect abnormal patient data, thereby illustrating
its potential in enhancing the understanding of the articulatory-acoustic
relation for both healthy individuals and patients.
Related papers
- Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation [71.31331402404662]
This paper proposes two novel data-efficient methods to learn dysarthric and elderly speaker-level features.
Speaker-regularized spectral basis embedding-SBE features that exploit a special regularization term to enforce homogeneity of speaker features in adaptation.
Feature-based learning hidden unit contributions (f-LHUC) that are conditioned on VR-LH features that are shown to be insensitive to speaker-level data quantity in testtime adaptation.
arXiv Detail & Related papers (2024-07-08T18:20:24Z) - Exploring Speech Pattern Disorders in Autism using Machine Learning [12.469348589699766]
This study presents a comprehensive approach to identify distinctive speech patterns through the analysis of examiner-patient dialogues.
We extracted 40 speech-related features, categorized into frequency, zero-crossing rate, energy, spectral characteristics, Mel Frequency Cepstral Coefficients (MFCCs) and balance.
The classification model aimed to differentiate between ASD and non-ASD cases, achieving an accuracy of 87.75%.
arXiv Detail & Related papers (2024-05-03T02:59:15Z) - Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records.
The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z) - Decoding speech perception from non-invasive brain recordings [48.46819575538446]
We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings.
Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
arXiv Detail & Related papers (2022-08-25T10:01:43Z) - Self-supervised speech unit discovery from articulatory and acoustic
features using VQ-VAE [2.771610203951056]
This study examines how articulatory information can be used for discovering speech units in a self-supervised setting.
We used vector-quantized variational autoencoders (VQ-VAE) to learn discrete representations from articulatory and acoustic speech data.
Experiments were conducted on three different corpora in English and French.
arXiv Detail & Related papers (2022-06-17T14:04:24Z) - Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging
Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems.
This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training.
Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z) - The effect of speech pathology on automatic speaker verification -- a
large-scale study [6.468412158245622]
pathological speech faces heightened privacy breach risks compared to healthy speech.
Adults with dysphonia are at heightened re-identification risks, whereas conditions like dysarthria yield results comparable to those of healthy speakers.
Merging data across pathological types led to a marked EER decrease, suggesting the potential benefits of pathological diversity in automatic speaker verification.
arXiv Detail & Related papers (2022-04-13T15:17:00Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker
Identity in Dysarthric Voice Conversion [50.040466658605524]
We propose a new paradigm for maintaining speaker identity in dysarthric voice conversion (DVC)
The poor quality of dysarthric speech can be greatly improved by statistical VC.
But as the normal speech utterances of a dysarthria patient are nearly impossible to collect, previous work failed to recover the individuality of the patient.
arXiv Detail & Related papers (2021-06-02T18:41:03Z) - Extracting the Locus of Attention at a Cocktail Party from Single-Trial
EEG using a Joint CNN-LSTM Model [0.1529342790344802]
Human brain performs remarkably well in segregating a particular speaker from interfering speakers in a multi-speaker scenario.
We present a joint convolutional neural network (CNN) - long short-term memory (LSTM) model to infer the auditory attention.
arXiv Detail & Related papers (2021-02-08T01:06:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.