An information-theoretic model of shallow and deep language comprehension
- URL: http://arxiv.org/abs/2405.08223v1
- Date: Mon, 13 May 2024 22:29:33 GMT
- Title: An information-theoretic model of shallow and deep language comprehension
- Authors: Jiaxuan Li, Richard Futrell,
- Abstract summary: We use information theory to formulate a model of language comprehension as an optimal trade-off between accuracy and processing depth.
By quantifying the timecourse of language processing as it proceeds from shallow to deep, our model provides a unified framework to explain behavioral and neural signatures of language comprehension.
- Score: 7.760815504640362
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A large body of work in psycholinguistics has focused on the idea that online language comprehension can be shallow or `good enough': given constraints on time or available computation, comprehenders may form interpretations of their input that are plausible but inaccurate. However, this idea has not yet been linked with formal theories of computation under resource constraints. Here we use information theory to formulate a model of language comprehension as an optimal trade-off between accuracy and processing depth, formalized as bits of information extracted from the input, which increases with processing time. The model provides a measure of processing effort as the change in processing depth, which we link to EEG signals and reading times. We validate our theory against a large-scale dataset of garden path sentence reading times, and EEG experiments featuring N400, P600 and biphasic ERP effects. By quantifying the timecourse of language processing as it proceeds from shallow to deep, our model provides a unified framework to explain behavioral and neural signatures of language comprehension.
Related papers
- Decomposition of surprisal: Unified computational model of ERP components in language processing [7.760815504640362]
We advance an information-theoretic model of human language processing in the brain in which incoming linguistic input is processed at first shallowly and later with more depth.
We show that the information content (surprisal) of a word in context can be decomposed into two quantities: (A) surprise, which signals shallow processing difficulty for a word, and (B) discrepancy signal, which reflects the discrepancy between shallow and deep interpretations.
arXiv Detail & Related papers (2024-09-10T18:14:02Z) - Injecting linguistic knowledge into BERT for Dialogue State Tracking [60.42231674887294]
This paper proposes a method that extracts linguistic knowledge via an unsupervised framework.
We then utilize this knowledge to augment BERT's performance and interpretability in Dialogue State Tracking (DST) tasks.
We benchmark this framework on various DST tasks and observe a notable improvement in accuracy.
arXiv Detail & Related papers (2023-11-27T08:38:42Z) - Meta predictive learning model of languages in neural circuits [2.5690340428649328]
We propose a mean-field learning model within the predictive coding framework.
Our model reveals that most of the connections become deterministic after learning.
Our model provides a starting point to investigate the connection among brain computation, next-token prediction and general intelligence.
arXiv Detail & Related papers (2023-09-08T03:58:05Z) - Arithmetic with Language Models: from Memorization to Computation [3.077668143048211]
This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data.
We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing.
arXiv Detail & Related papers (2023-08-02T13:58:37Z) - A unified information-theoretic model of EEG signatures of human
language processing [7.190747604294439]
We advance an information-theoretic model of human language processing in the brain, in which incoming linguistic input is processed at two levels.
We propose that these two kinds of information processing have distinct electroencephalographic signatures, corresponding to the well-documented N400 and P600 components of language-related event-related potentials (ERPs)
Our theory is in principle compatible with traditional cognitive theories assuming a good-enough' interpretation stage, but with precise information-theoretic formulation.
arXiv Detail & Related papers (2022-12-16T00:15:45Z) - Bidirectional Representations for Low Resource Spoken Language
Understanding [39.208462511430554]
We propose a representation model to encode speech in bidirectional rich encodings.
The approach uses a masked language modelling objective to learn the representations.
We show that the performance of the resulting encodings is better than comparable models on multiple datasets.
arXiv Detail & Related papers (2022-11-24T17:05:16Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long.
We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay.
Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z) - How Context Affects Language Models' Factual Predictions [134.29166998377187]
We integrate information from a retrieval system with a pre-trained language model in a purely unsupervised way.
We report that augmenting pre-trained language models in this way dramatically improves performance and that the resulting system, despite being unsupervised, is competitive with a supervised machine reading baseline.
arXiv Detail & Related papers (2020-05-10T09:28:12Z) - Data Augmentation for Spoken Language Understanding via Pretrained
Language Models [113.56329266325902]
Training of spoken language understanding (SLU) models often faces the problem of data scarcity.
We put forward a data augmentation method using pretrained language models to boost the variability and accuracy of generated utterances.
arXiv Detail & Related papers (2020-04-29T04:07:12Z) - Information-Theoretic Probing for Linguistic Structure [74.04862204427944]
We propose an information-theoretic operationalization of probing as estimating mutual information.
We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research.
arXiv Detail & Related papers (2020-04-07T01:06:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.