A Tale of Two Perplexities: Sensitivity of Neural Language Models to
Lexical Retrieval Deficits in Dementia of the Alzheimer's Type
- URL: http://arxiv.org/abs/2005.03593v2
- Date: Sun, 28 Jun 2020 19:59:48 GMT
- Title: A Tale of Two Perplexities: Sensitivity of Neural Language Models to
Lexical Retrieval Deficits in Dementia of the Alzheimer's Type
- Authors: Trevor Cohen and Serguei Pakhomov
- Abstract summary: In recent years there has been a burgeoning interest in the use of computational methods to distinguish between elicited speech samples produced by patients with dementia, and those from healthy controls.
The difference between perplexity estimates from two neural language models (LMs) has been shown to produce state-of-the-art performance.
We find that perplexity of neural LMs is strongly and differentially associated with lexical frequency, and that a mixture model resulting from interpolating control and dementia LMs improves upon the current state-of-the-art for models trained on transcript text exclusively.
- Score: 10.665308703417665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years there has been a burgeoning interest in the use of
computational methods to distinguish between elicited speech samples produced
by patients with dementia, and those from healthy controls. The difference
between perplexity estimates from two neural language models (LMs) - one
trained on transcripts of speech produced by healthy participants and the other
trained on transcripts from patients with dementia - as a single feature for
diagnostic classification of unseen transcripts has been shown to produce
state-of-the-art performance. However, little is known about why this approach
is effective, and on account of the lack of case/control matching in the most
widely-used evaluation set of transcripts (DementiaBank), it is unclear if
these approaches are truly diagnostic, or are sensitive to other variables. In
this paper, we interrogate neural LMs trained on participants with and without
dementia using synthetic narratives previously developed to simulate
progressive semantic dementia by manipulating lexical frequency. We find that
perplexity of neural LMs is strongly and differentially associated with lexical
frequency, and that a mixture model resulting from interpolating control and
dementia LMs improves upon the current state-of-the-art for models trained on
transcript text exclusively.
Related papers
- A New Benchmark and Reverse Validation Method for Passage-level
Hallucination Detection [63.56136319976554]
Large Language Models (LLMs) generate hallucinations, which can cause significant damage when deployed for mission-critical tasks.
We propose a self-check approach based on reverse validation to detect factual errors automatically in a zero-resource fashion.
We empirically evaluate our method and existing zero-resource detection methods on two datasets.
arXiv Detail & Related papers (2023-10-10T10:14:59Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Zero-Resource Hallucination Prevention for Large Language Models [45.4155729393135]
"Hallucination" refers to instances where large language models (LLMs) generate factually inaccurate or ungrounded information.
We introduce a novel pre-language self-evaluation technique, referred to as SELF-FAMILIARITY, which focuses on evaluating the model's familiarity with the concepts present in the input instruction.
We validate SELF-FAMILIARITY across four different large language models, demonstrating consistently superior performance compared to existing techniques.
arXiv Detail & Related papers (2023-09-06T01:57:36Z) - Automatically measuring speech fluency in people with aphasia: first
achievements using read-speech data [55.84746218227712]
This study aims at assessing the relevance of a signalprocessingalgorithm, initially developed in the field of language acquisition, for the automatic measurement of speech fluency.
arXiv Detail & Related papers (2023-08-09T07:51:40Z) - Neural Language Models are not Born Equal to Fit Brain Data, but
Training Helps [75.84770193489639]
We examine the impact of test loss, training corpus and model architecture on the prediction of functional Magnetic Resonance Imaging timecourses of participants listening to an audiobook.
We find that untrained versions of each model already explain significant amount of signal in the brain by capturing similarity in brain responses across identical words.
We suggest good practices for future studies aiming at explaining the human language system using neural language models.
arXiv Detail & Related papers (2022-07-07T15:37:17Z) - Self-supervised models of audio effectively explain human cortical
responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system.
We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z) - GPT-D: Inducing Dementia-related Linguistic Anomalies by Deliberate
Degradation of Artificial Neural Language Models [7.8430387435520625]
We propose a novel method by which a Transformer DL model (GPT-2) pre-trained on general English text is paired with an artificially degraded version of itself (GPT-D)
This technique approaches state-of-the-art performance on text data from a widely used "Cookie Theft" picture description task, and unlike established alternatives also generalizes well to spontaneous conversations.
Our study is a step toward better understanding of the relationships between the inner workings of generative neural language models, the language that they produce, and the deleterious effects of dementia on human speech and language characteristics.
arXiv Detail & Related papers (2022-03-25T00:25:42Z) - Detecting Dementia from Speech and Transcripts using Transformers [0.0]
Alzheimer's disease (AD) constitutes a neurodegenerative disease with serious consequences to peoples' everyday lives, if it is not diagnosed early since there is no available cure.
Current work has been focused on diagnosing dementia from spontaneous speech.
arXiv Detail & Related papers (2021-10-27T21:00:01Z) - Multi-Modal Detection of Alzheimer's Disease from Speech and Text [3.702631194466718]
We propose a deep learning method that utilizes speech and the corresponding transcript simultaneously to detect Alzheimer's disease (AD)
The proposed method achieves 85.3% 10-fold cross-validation accuracy when trained and evaluated on the Dementiabank Pitt corpus.
arXiv Detail & Related papers (2020-11-30T21:18:17Z) - Comparing Natural Language Processing Techniques for Alzheimer's
Dementia Prediction in Spontaneous Speech [1.2805268849262246]
Alzheimer's Dementia (AD) is an incurable, debilitating, and progressive neurodegenerative condition that affects cognitive function.
The Alzheimer's Dementia Recognition through Spontaneous Speech task offers acoustically pre-processed and balanced datasets for the classification and prediction of AD.
arXiv Detail & Related papers (2020-06-12T17:51:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.