Bilingual Dual-Head Deep Model for Parkinson's Disease Detection from Speech
- URL: http://arxiv.org/abs/2503.10301v1
- Date: Thu, 13 Mar 2025 12:23:11 GMT
- Title: Bilingual Dual-Head Deep Model for Parkinson's Disease Detection from Speech
- Authors: Moreno La Quatra, Juan Rafael Orozco-Arroyave, Marco Sabato Siniscalchi,
- Abstract summary: This work aims to tackle the Parkinson's disease (PD) detection problem from the speech signal in a bilingual setting by proposing an ad-hoc dual-head deep neural architecture for type-based binary classification.<n>One head is specialized for diadochokinetic patterns. The other head looks for natural speech patterns present in continuous spoken utterances.
- Score: 8.852211803870016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work aims to tackle the Parkinson's disease (PD) detection problem from the speech signal in a bilingual setting by proposing an ad-hoc dual-head deep neural architecture for type-based binary classification. One head is specialized for diadochokinetic patterns. The other head looks for natural speech patterns present in continuous spoken utterances. Only one of the two heads is operative accordingly to the nature of the input. Speech representations are extracted from self-supervised learning (SSL) models and wavelet transforms. Adaptive layers, convolutional bottlenecks, and contrastive learning are exploited to reduce variations across languages. Our solution is assessed against two distinct datasets, EWA-DB, and PC-GITA, which cover Slovak and Spanish languages, respectively. Results indicate that conventional models trained on a single language dataset struggle with cross-linguistic generalization, and naive combinations of datasets are suboptimal. In contrast, our model improves generalization on both languages, simultaneously.
Related papers
- Language Barriers: Evaluating Cross-Lingual Performance of CNN and Transformer Architectures for Speech Quality Estimation [9.286959744769792]
Cross-lingual generalization of objective speech quality models is a major challenge.<n>Models trained primarily on English data may struggle to generalize to languages with different phonetic, tonal, and prosodic characteristics.<n>This study investigates the cross-lingual performance of two speech quality models: NISQA, a CNN-based model, and a Transformer-based Audio Spectrogram Transformer (AST) model.
arXiv Detail & Related papers (2025-02-18T16:22:43Z) - SpeechAlign: Aligning Speech Generation to Human Preferences [51.684183257809075]
We introduce SpeechAlign, an iterative self-improvement strategy that aligns speech language models to human preferences.
We show that SpeechAlign can bridge the distribution gap and facilitate continuous self-improvement of the speech language model.
arXiv Detail & Related papers (2024-04-08T15:21:17Z) - Multilingual self-supervised speech representations improve the speech
recognition of low-resource African languages with codeswitching [65.74653592668743]
Finetuning self-supervised multilingual representations reduces absolute word error rates by up to 20%.
In circumstances with limited training data finetuning self-supervised representations is a better performing and viable solution.
arXiv Detail & Related papers (2023-11-25T17:05:21Z) - Do self-supervised speech and language models extract similar
representations as human brain? [2.390915090736061]
Speech and language models trained through self-supervised learning (SSL) demonstrate strong alignment with brain activity during speech and language perception.
We evaluate the brain prediction performance of two representative SSL models, Wav2Vec2.0 and GPT-2.
arXiv Detail & Related papers (2023-10-07T01:39:56Z) - Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer [53.72998363956454]
Direct speech-to-speech translation (S2ST) with discrete self-supervised representations has achieved remarkable accuracy.
The scarcity of high-quality speaker-parallel data poses a challenge for learning style transfer during translation.
We design an S2ST pipeline with style-transfer capability on the basis of discrete self-supervised speech representations and timbre units.
arXiv Detail & Related papers (2023-09-14T09:52:08Z) - LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and
Translation Using Neural Transducers [71.76680102779765]
Automatic speech recognition (ASR) and speech translation (ST) can both use neural transducers as the model structure.
We propose LAMASSU, a streaming language-agnostic multilingual speech recognition and translation model using neural transducers.
arXiv Detail & Related papers (2022-11-05T04:03:55Z) - Causal Analysis of Syntactic Agreement Neurons in Multilingual Language
Models [28.036233760742125]
We causally probe multilingual language models (XGLM and multilingual BERT) across various languages.
We find significant neuron overlap across languages in autoregressive multilingual language models, but not masked language models.
arXiv Detail & Related papers (2022-10-25T20:43:36Z) - GPT-D: Inducing Dementia-related Linguistic Anomalies by Deliberate
Degradation of Artificial Neural Language Models [7.8430387435520625]
We propose a novel method by which a Transformer DL model (GPT-2) pre-trained on general English text is paired with an artificially degraded version of itself (GPT-D)
This technique approaches state-of-the-art performance on text data from a widely used "Cookie Theft" picture description task, and unlike established alternatives also generalizes well to spontaneous conversations.
Our study is a step toward better understanding of the relationships between the inner workings of generative neural language models, the language that they produce, and the deleterious effects of dementia on human speech and language characteristics.
arXiv Detail & Related papers (2022-03-25T00:25:42Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - SPLAT: Speech-Language Joint Pre-Training for Spoken Language
Understanding [61.02342238771685]
Spoken language understanding requires a model to analyze input acoustic signal to understand its linguistic content and make predictions.
Various pre-training methods have been proposed to learn rich representations from large-scale unannotated speech and text.
We propose a novel semi-supervised learning framework, SPLAT, to jointly pre-train the speech and language modules.
arXiv Detail & Related papers (2020-10-05T19:29:49Z) - Analysis of Predictive Coding Models for Phonemic Representation
Learning in Small Datasets [0.0]
The present study investigates the behaviour of two predictive coding models, Autoregressive Predictive Coding and Contrastive Predictive Coding, in a phoneme discrimination task.
Our experiments show a strong correlation between the autoregressive loss and the phoneme discrimination scores with the two datasets.
The CPC model shows rapid convergence already after one pass over the training data, and, on average, its representations outperform those of APC on both languages.
arXiv Detail & Related papers (2020-07-08T15:46:13Z) - Unsupervised Cross-lingual Representation Learning for Speech
Recognition [63.85924123692923]
XLSR learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.
We build on wav2vec 2.0 which is trained by solving a contrastive task over masked latent speech representations.
Experiments show that cross-lingual pretraining significantly outperforms monolingual pretraining.
arXiv Detail & Related papers (2020-06-24T18:25:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.