On the Acquisition of Shared Grammatical Representations in Bilingual Language Models
- URL: http://arxiv.org/abs/2503.03962v1
- Date: Wed, 05 Mar 2025 23:27:58 GMT
- Title: On the Acquisition of Shared Grammatical Representations in Bilingual Language Models
- Authors: Catherine Arnett, Tyler A. Chang, James A. Michaelov, Benjamin K. Bergen,
- Abstract summary: We ask what happens to a monolingual language model when it begins to be trained on a second language.<n>To find evidence of shared multilingual representations, we turn to structural priming, a method used to study grammatical representations in humans.<n>We argue that this asymmetry may shape hypotheses about human structural priming effects.
- Score: 6.266732217239363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While crosslingual transfer is crucial to contemporary language models' multilingual capabilities, how it occurs is not well understood. In this paper, we ask what happens to a monolingual language model when it begins to be trained on a second language. Specifically, we train small bilingual models for which we control the amount of data for each language and the order of language exposure. To find evidence of shared multilingual representations, we turn to structural priming, a method used to study grammatical representations in humans. We first replicate previous crosslingual structural priming results and find that after controlling for training data quantity and language exposure, there are asymmetrical effects across language pairs and directions. We argue that this asymmetry may shape hypotheses about human structural priming effects. We also find that structural priming effects are less robust for less similar language pairs, highlighting potential limitations of crosslingual transfer learning and shared representations for typologically diverse languages.
Related papers
- High-Dimensional Interlingual Representations of Large Language Models [65.77317753001954]
Large language models (LLMs) trained on massive multilingual datasets hint at the formation of interlingual constructs.
We explore 31 diverse languages varying on their resource-levels, typologies, and geographical regions.
We find that multilingual LLMs exhibit inconsistent cross-lingual alignments.
arXiv Detail & Related papers (2025-03-14T10:39:27Z) - Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages [15.203789021094982]
In large language models (LLMs), how are multiple languages learned and encoded?<n>We train sparse autoencoders on Llama-3-8B and Aya-23-8B, and demonstrate that abstract grammatical concepts are often encoded in feature directions shared across many languages.
arXiv Detail & Related papers (2025-01-10T21:18:21Z) - The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments [57.273662221547056]
In this study, we investigate an unintuitive novel driver of cross-lingual generalisation: language imbalance.
We observe that the existence of a predominant language during training boosts the performance of less frequent languages.
As we extend our analysis to real languages, we find that infrequent languages still benefit from frequent ones, yet whether language imbalance causes cross-lingual generalisation there is not conclusive.
arXiv Detail & Related papers (2024-04-11T17:58:05Z) - Structural Priming Demonstrates Abstract Grammatical Representations in
Multilingual Language Models [6.845954748361076]
We find evidence for abstract monolingual and crosslingual grammatical representations in large language models.
Results demonstrate that grammatical representations in multilingual language models are not only similar across languages, but they can causally influence text produced in different languages.
arXiv Detail & Related papers (2023-11-15T18:39:56Z) - Crosslingual Structural Priming and the Pre-Training Dynamics of
Bilingual Language Models [6.845954748361076]
We use structural priming to test for abstract grammatical representations with causal effects on model outputs.
We extend the approach to a Dutch-English bilingual setting, and we evaluate a Dutch-English language model during pre-training.
We find that crosslingual structural priming effects emerge early after exposure to the second language, with less than 1M tokens of data in that language.
arXiv Detail & Related papers (2023-10-11T22:57:03Z) - Cross-Lingual Ability of Multilingual Masked Language Models: A Study of
Language Structure [54.01613740115601]
We study three language properties: constituent order, composition and word co-occurrence.
Our main conclusion is that the contribution of constituent order and word co-occurrence is limited, while the composition is more crucial to the success of cross-linguistic transfer.
arXiv Detail & Related papers (2022-03-16T07:09:35Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - How Good is Your Tokenizer? On the Monolingual Performance of
Multilingual Language Models [96.32118305166412]
We study a set of nine typologically diverse languages with readily available pretrained monolingual models on a set of five diverse monolingual downstream tasks.
We find that languages which are adequately represented in the multilingual model's vocabulary exhibit negligible performance decreases over their monolingual counterparts.
arXiv Detail & Related papers (2020-12-31T14:11:00Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.