The Grammar-Learning Trajectories of Neural Language Models
- URL: http://arxiv.org/abs/2109.06096v1
- Date: Mon, 13 Sep 2021 16:17:23 GMT
- Title: The Grammar-Learning Trajectories of Neural Language Models
- Authors: Leshem Choshen, Guy Hacohen, Daphna Weinshall, Omri Abend
- Abstract summary: We show that neural language models acquire linguistic phenomena in a similar order, despite having different end performances over the data.
Results suggest that NLMs exhibit consistent developmental'' stages.
- Score: 42.32479280480742
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The learning trajectories of linguistic phenomena provide insight into the
nature of linguistic representation, beyond what can be gleaned from inspecting
the behavior of an adult speaker. To apply a similar approach to analyze neural
language models (NLM), it is first necessary to establish that different models
are similar enough in the generalizations they make. In this paper, we show
that NLMs with different initialization, architecture, and training data
acquire linguistic phenomena in a similar order, despite having different end
performances over the data. Leveraging these findings, we compare the relative
performance on different phenomena at varying learning stages with simpler
reference models. Results suggest that NLMs exhibit consistent
``developmental'' stages. Initial analysis of these stages presents phenomena
clusters (notably morphological ones), whose performance progresses in unison,
suggesting potential links between their acquired representations.
Related papers
- Investigating the Timescales of Language Processing with EEG and Language Models [0.0]
This study explores the temporal dynamics of language processing by examining the alignment between word representations from a pre-trained language model and EEG data.
Using a Temporal Response Function (TRF) model, we investigate how neural activity corresponds to model representations across different layers.
Our analysis reveals patterns in TRFs from distinct layers, highlighting varying contributions to lexical and compositional processing.
arXiv Detail & Related papers (2024-06-28T12:49:27Z) - Interpretability of Language Models via Task Spaces [14.543168558734001]
We present an alternative approach to interpret language models (LMs)
We focus on the quality of LM processing, with a focus on their language abilities.
We construct 'linguistic task spaces' that shed light on the connections LMs draw between language phenomena.
arXiv Detail & Related papers (2024-06-10T16:34:30Z) - Holmes: A Benchmark to Assess the Linguistic Competence of Language Models [59.627729608055006]
We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence.
We use computation-based probing to examine LMs' internal representations regarding distinct linguistic phenomena.
As a result, we meet recent calls to disentangle LMs' linguistic competence from other cognitive abilities.
arXiv Detail & Related papers (2024-04-29T17:58:36Z) - Self-Supervised Models of Speech Infer Universal Articulatory Kinematics [44.27187669492598]
We show "inference of articulatory kinematics" as fundamental property of SSL models.
We also show that this abstraction is largely overlapping across the language of the data used to train the model.
We show that with simple affine transformations, Acoustic-to-Articulatory inversion (AAI) is transferrable across speakers, even across genders, languages, and dialects.
arXiv Detail & Related papers (2023-10-16T19:50:01Z) - SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models.
A meta-model can learn on self-supervised prompts consisting of tailored demonstrations.
Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z) - Localization vs. Semantics: Visual Representations in Unimodal and
Multimodal Models [57.08925810659545]
We conduct a comparative analysis of the visual representations in existing vision-and-language models and vision-only models.
Our empirical observations suggest that vision-and-language models are better at label prediction tasks.
We hope our study sheds light on the role of language in visual learning, and serves as an empirical guide for various pretrained models.
arXiv Detail & Related papers (2022-12-01T05:00:18Z) - On the Compositional Generalization Gap of In-Context Learning [73.09193595292233]
We look at the gap between the in-distribution (ID) and out-of-distribution (OOD) performance of such models in semantic parsing tasks with in-context learning.
We evaluate four model families, OPT, BLOOM, CodeGen and Codex on three semantic parsing datasets.
arXiv Detail & Related papers (2022-11-15T19:56:37Z) - Implicit Representations of Meaning in Neural Language Models [31.71898809435222]
We identify contextual word representations that function as models of entities and situations as they evolve throughout a discourse.
Our results indicate that prediction in pretrained neural language models is supported, at least in part, by dynamic representations of meaning and implicit simulation of entity state.
arXiv Detail & Related papers (2021-06-01T19:23:20Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - Overestimation of Syntactic Representationin Neural Language Models [16.765097098482286]
One popular method for determining a model's ability to induce syntactic structure trains a model on strings generated according to a template then tests the model's ability to distinguish such strings from superficially similar ones with different syntax.
We illustrate a fundamental problem with this approach by reproducing positive results from a recent paper with two non-syntactic baseline language models.
arXiv Detail & Related papers (2020-04-10T15:13:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.