Too Big to Fail: Larger Language Models are Disproportionately Resilient to Induction of Dementia-Related Linguistic Anomalies
- URL: http://arxiv.org/abs/2406.02830v1
- Date: Wed, 5 Jun 2024 00:31:50 GMT
- Title: Too Big to Fail: Larger Language Models are Disproportionately Resilient to Induction of Dementia-Related Linguistic Anomalies
- Authors: Changye Li, Zhecheng Sheng, Trevor Cohen, Serguei Pakhomov,
- Abstract summary: We show that larger GPT-2 models require a disproportionately larger share of attention heads to be masked/ablated to display degradation magnitude to masking.
These results suggest that the attention mechanism in transformer models may present an analogue to the notions of cognitive and brain reserve.
- Score: 7.21603206617401
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: As artificial neural networks grow in complexity, understanding their inner workings becomes increasingly challenging, which is particularly important in healthcare applications. The intrinsic evaluation metrics of autoregressive neural language models (NLMs), perplexity (PPL), can reflect how "surprised" an NLM model is at novel input. PPL has been widely used to understand the behavior of NLMs. Previous findings show that changes in PPL when masking attention layers in pre-trained transformer-based NLMs reflect linguistic anomalies associated with Alzheimer's disease dementia. Building upon this, we explore a novel bidirectional attention head ablation method that exhibits properties attributed to the concepts of cognitive and brain reserve in human brain studies, which postulate that people with more neurons in the brain and more efficient processing are more resilient to neurodegeneration. Our results show that larger GPT-2 models require a disproportionately larger share of attention heads to be masked/ablated to display degradation of similar magnitude to masking in smaller models. These results suggest that the attention mechanism in transformer models may present an analogue to the notions of cognitive and brain reserve and could potentially be used to model certain aspects of the progression of neurodegenerative disorders and aging.
Related papers
- Brain-like Functional Organization within Large Language Models [58.93629121400745]
The human brain has long inspired the pursuit of artificial intelligence (AI)
Recent neuroimaging studies provide compelling evidence of alignment between the computational representation of artificial neural networks (ANNs) and the neural responses of the human brain to stimuli.
In this study, we bridge this gap by directly coupling sub-groups of artificial neurons with functional brain networks (FBNs)
This framework links the AN sub-groups to FBNs, enabling the delineation of brain-like functional organization within large language models (LLMs)
arXiv Detail & Related papers (2024-10-25T13:15:17Z) - Contrastive Learning in Memristor-based Neuromorphic Systems [55.11642177631929]
Spiking neural networks have become an important family of neuron-based models that sidestep many of the key limitations facing modern-day backpropagation-trained deep networks.
In this work, we design and investigate a proof-of-concept instantiation of contrastive-signal-dependent plasticity (CSDP), a neuromorphic form of forward-forward-based, backpropagation-free learning.
arXiv Detail & Related papers (2024-09-17T04:48:45Z) - Large Language Model-based FMRI Encoding of Language Functions for Subjects with Neurocognitive Disorder [53.575426835313536]
This paper explores language-related functional changes in older NCD adults using LLM-based fMRI encoding and brain scores.
We analyze the correlation between brain scores and cognitive scores at both whole-brain and language-related ROI levels.
Our findings reveal that higher cognitive abilities correspond to better brain scores, with correlations peaking in the middle temporal gyrus.
arXiv Detail & Related papers (2024-07-15T01:09:08Z) - Coupling Artificial Neurons in BERT and Biological Neurons in the Human
Brain [9.916033214833407]
This study introduces a novel, general, and effective framework to link transformer-based NLP models and neural activities in response to language.
Our experimental results demonstrate 1) The activations of ANs and BNs are significantly synchronized; 2) the ANs carry meaningful linguistic/semantic information and anchor to their BN signatures; 3) the anchored BNs are interpretable in a neurolinguistic context.
arXiv Detail & Related papers (2023-03-27T01:41:48Z) - A Comprehensive Comparison of Neural Networks as Cognitive Models of
Inflection [20.977461918631928]
We study the correlation between human judgments and neural network probabilities for unknown word inflections.
We find evidence that the Transformer may be a better account of human behavior than LSTMs.
arXiv Detail & Related papers (2022-10-22T00:59:40Z) - Adapting Brain-Like Neural Networks for Modeling Cortical Visual
Prostheses [68.96380145211093]
Cortical prostheses are devices implanted in the visual cortex that attempt to restore lost vision by electrically stimulating neurons.
Currently, the vision provided by these devices is limited, and accurately predicting the visual percepts resulting from stimulation is an open challenge.
We propose to address this challenge by utilizing 'brain-like' convolutional neural networks (CNNs), which have emerged as promising models of the visual system.
arXiv Detail & Related papers (2022-09-27T17:33:19Z) - Neural Language Models are not Born Equal to Fit Brain Data, but
Training Helps [75.84770193489639]
We examine the impact of test loss, training corpus and model architecture on the prediction of functional Magnetic Resonance Imaging timecourses of participants listening to an audiobook.
We find that untrained versions of each model already explain significant amount of signal in the brain by capturing similarity in brain responses across identical words.
We suggest good practices for future studies aiming at explaining the human language system using neural language models.
arXiv Detail & Related papers (2022-07-07T15:37:17Z) - Mesoscopic modeling of hidden spiking neurons [3.6868085124383616]
We use coarse-graining and mean-field approximations to derive a bottom-up, neuronally-grounded latent variable model (neuLVM)
neuLVM can be explicitly mapped to a recurrent, multi-population spiking neural network (SNN)
We show, on synthetic spike trains, that a few observed neurons are sufficient for neuLVM to perform efficient model inversion of large SNNs.
arXiv Detail & Related papers (2022-05-26T17:04:39Z) - Learning by Active Forgetting for Neural Networks [36.47528616276579]
Remembering and forgetting mechanisms are two sides of the same coin in a human learning-memory system.
Modern machine learning systems have been working to endow machine with lifelong learning capability through better remembering.
This paper presents a learning model by active forgetting mechanism with artificial neural networks.
arXiv Detail & Related papers (2021-11-21T14:55:03Z) - On-the-Fly Attention Modularization for Neural Generation [54.912042110885366]
We show that generated text is repetitive, generic, self-inconsistent, and lacking commonsense.
Our findings motivate on-the-fly attention modularization, a simple but effective method for injecting inductive biases into attention during inference.
arXiv Detail & Related papers (2021-01-02T05:16:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.