Speech-Based Cognitive Screening: A Systematic Evaluation of LLM Adaptation Strategies
- URL: http://arxiv.org/abs/2509.03525v2
- Date: Tue, 07 Oct 2025 06:46:04 GMT
- Title: Speech-Based Cognitive Screening: A Systematic Evaluation of LLM Adaptation Strategies
- Authors: Fatemeh Taherinezhad, Mohamad Javad Momeni Nezhad, Sepehr Karimi, Sina Rashidi, Ali Zolnour, Maryam Dadkhah, Yasaman Haghbin, Hossein AzadMaleki, Maryam Zolnoori,
- Abstract summary: Over half of US adults with Alzheimer disease and related dementias remain undiagnosed.<n>We compared large language model adaptation strategies for dementia detection using the DementiaBank speech corpus.
- Score: 1.6807214708043468
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Over half of US adults with Alzheimer disease and related dementias remain undiagnosed, and speech-based screening offers a scalable detection approach. We compared large language model adaptation strategies for dementia detection using the DementiaBank speech corpus, evaluating nine text-only models and three multimodal audio-text models on recordings from DementiaBank speech corpus. Adaptations included in-context learning with different demonstration selection policies, reasoning-augmented prompting, parameter-efficient fine-tuning, and multimodal integration. Results showed that class-centroid demonstrations achieved the highest in-context learning performance, reasoning improved smaller models, and token-level fine-tuning generally produced the best scores. Adding a classification head substantially improved underperforming models. Among multimodal models, fine-tuned audio-text systems performed well but did not surpass the top text-only models. These findings highlight that model adaptation strategies, including demonstration selection, reasoning design, and tuning method, critically influence speech-based dementia detection, and that properly adapted open-weight models can match or exceed commercial systems.
Related papers
- Linguistic and Audio Embedding-Based Machine Learning for Alzheimer's Dementia and Mild Cognitive Impairment Detection: Insights from the PROCESS Challenge [0.0]
Speech, encompassing both acoustic and linguistic dimensions, offers a promising non-invasive biomarker for cognitive decline.<n>We present a machine learning framework for the PROCESS Challenge, leveraging both audio embeddings and linguistic features derived from spontaneous speech recordings.
arXiv Detail & Related papers (2025-10-02T06:54:55Z) - Bigger But Not Better: Small Neural Language Models Outperform Large Language Models in Detection of Thought Disorder [7.585589727435719]
We investigate whether smaller neural language models can serve as effective alternatives for detecting positive formal thought disorder.<n>Surprisingly, our results show that smaller models are more sensitive to linguistic differences associated with formal thought disorder than their larger counterparts.
arXiv Detail & Related papers (2025-03-25T22:55:58Z) - End-to-End Speech Recognition and Disfluency Removal with Acoustic
Language Model Pretraining [0.0]
We revisit the performance comparison between two-stage and end-to-end model.
We find that audio based language models pretrained using weak self-supervised objectives match or exceed the performance of similarly trained two-stage models.
arXiv Detail & Related papers (2023-09-08T17:12:14Z) - SimOAP: Improve Coherence and Consistency in Persona-based Dialogue
Generation via Over-sampling and Post-evaluation [54.66399120084227]
Language models trained on large-scale corpora can generate remarkably fluent results in open-domain dialogue.
For the persona-based dialogue generation task, consistency and coherence are great challenges for language models.
A two-stage SimOAP strategy is proposed, i.e., over-sampling and post-evaluation.
arXiv Detail & Related papers (2023-05-18T17:23:00Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - A Systematic Comparison of Phonetic Aware Techniques for Speech
Enhancement [20.329872147913584]
We compare different methods of incorporating phonetic information in a speech enhancement model.
We observe the influence of different phonetic content models as well as various feature-injection techniques on enhancement performance.
arXiv Detail & Related papers (2022-06-22T12:00:50Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - Self-supervised models of audio effectively explain human cortical
responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system.
We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z) - LDNet: Unified Listener Dependent Modeling in MOS Prediction for
Synthetic Speech [67.88748572167309]
We present LDNet, a unified framework for mean opinion score (MOS) prediction.
We propose two inference methods that provide more stable results and efficient computation.
arXiv Detail & Related papers (2021-10-18T08:52:31Z) - Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction.
It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition.
We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.