Same Neurons, Different Languages: Probing Morphosyntax in Multilingual
Pre-trained Models
- URL: http://arxiv.org/abs/2205.02023v2
- Date: Thu, 5 May 2022 11:53:38 GMT
- Title: Same Neurons, Different Languages: Probing Morphosyntax in Multilingual
Pre-trained Models
- Authors: Karolina Sta\'nczak, Edoardo Ponti, Lucas Torroba Hennigen, Ryan
Cotterell, Isabelle Augenstein
- Abstract summary: We conjecture that multilingual pre-trained models can derive language-universal abstractions about grammar.
We conduct the first large-scale empirical study over 43 languages and 14 morphosyntactic categories with a state-of-the-art neuron-level probe.
- Score: 84.86942006830772
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of multilingual pre-trained models is underpinned by their
ability to learn representations shared by multiple languages even in absence
of any explicit supervision. However, it remains unclear how these models learn
to generalise across languages. In this work, we conjecture that multilingual
pre-trained models can derive language-universal abstractions about grammar. In
particular, we investigate whether morphosyntactic information is encoded in
the same subset of neurons in different languages. We conduct the first
large-scale empirical study over 43 languages and 14 morphosyntactic categories
with a state-of-the-art neuron-level probe. Our findings show that the
cross-lingual overlap between neurons is significant, but its extent may vary
across categories and depends on language proximity and pre-training data size.
Related papers
- Navigating Brain Language Representations: A Comparative Analysis of Neural Language Models and Psychologically Plausible Models [29.50162863143141]
We compare encoding performance of various neural language models and psychologically plausible models.
Surprisingly, our findings revealed that psychologically plausible models outperformed neural language models across diverse contexts.
arXiv Detail & Related papers (2024-04-30T08:48:07Z) - On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons [37.32174349956148]
We analyze the neuron-level internal behavior of multilingual decoder-based language models (PLMs)
We show that language-specific neurons are unique, with a slight overlap ( 5%) between languages.
We tamper with less than 1% of the total neurons in each model during inference and demonstrate that tampering with a few language-specific neurons drastically changes the probability of target language occurrence in text generation.
arXiv Detail & Related papers (2024-04-03T03:37:22Z) - Language Embeddings Sometimes Contain Typological Generalizations [0.0]
We train neural models for a range of natural language processing tasks on a massively multilingual dataset of Bible translations in 1295 languages.
The learned language representations are then compared to existing typological databases as well as to a novel set of quantitative syntactic and morphological features.
We conclude that some generalizations are surprisingly close to traditional features from linguistic typology, but that most models, as well as those of previous work, do not appear to have made linguistically meaningful generalizations.
arXiv Detail & Related papers (2023-01-19T15:09:59Z) - Causal Analysis of Syntactic Agreement Neurons in Multilingual Language
Models [28.036233760742125]
We causally probe multilingual language models (XGLM and multilingual BERT) across various languages.
We find significant neuron overlap across languages in autoregressive multilingual language models, but not masked language models.
arXiv Detail & Related papers (2022-10-25T20:43:36Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z) - Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models.
In detail, we first train neural language models with a novel dependency modeling objective.
We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Analyzing Individual Neurons in Pre-trained Language Models [41.07850306314594]
We find small subsets of neurons to predict linguistic tasks, with lower level tasks localized in fewer neurons, compared to higher level task of predicting syntax.
For example, we found neurons in XLNet to be more localized and disjoint when predicting properties compared to BERT and others, where they are more distributed and coupled.
arXiv Detail & Related papers (2020-10-06T13:17:38Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - Cross-lingual, Character-Level Neural Morphological Tagging [57.0020906265213]
We train character-level recurrent neural taggers to predict morphological taggings for high-resource languages and low-resource languages together.
Learning joint character representations among multiple related languages successfully enables knowledge transfer from the high-resource languages to the low-resource ones, improving accuracy by up to 30% over a monolingual model.
arXiv Detail & Related papers (2017-08-30T08:14:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.