Causal Analysis of Syntactic Agreement Neurons in Multilingual Language
Models
- URL: http://arxiv.org/abs/2210.14328v1
- Date: Tue, 25 Oct 2022 20:43:36 GMT
- Title: Causal Analysis of Syntactic Agreement Neurons in Multilingual Language
Models
- Authors: Aaron Mueller, Yu Xia, Tal Linzen
- Abstract summary: We causally probe multilingual language models (XGLM and multilingual BERT) across various languages.
We find significant neuron overlap across languages in autoregressive multilingual language models, but not masked language models.
- Score: 28.036233760742125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Structural probing work has found evidence for latent syntactic information
in pre-trained language models. However, much of this analysis has focused on
monolingual models, and analyses of multilingual models have employed
correlational methods that are confounded by the choice of probing tasks. In
this study, we causally probe multilingual language models (XGLM and
multilingual BERT) as well as monolingual BERT-based models across various
languages; we do this by performing counterfactual perturbations on neuron
activations and observing the effect on models' subject-verb agreement
probabilities. We observe where in the model and to what extent syntactic
agreement is encoded in each language. We find significant neuron overlap
across languages in autoregressive multilingual language models, but not masked
language models. We also find two distinct layer-wise effect patterns and two
distinct sets of neurons used for syntactic agreement, depending on whether the
subject and verb are separated by other tokens. Finally, we find that
behavioral analyses of language models are likely underestimating how sensitive
masked language models are to syntactic information.
Related papers
- Modeling language contact with the Iterated Learning Model [0.0]
Iterated learning models are agent-based models of language change.
A recently introduced type of iterated learning model, the Semi-Supervised ILM is used to simulate language contact.
arXiv Detail & Related papers (2024-06-11T01:43:23Z) - Multitasking Models are Robust to Structural Failure: A Neural Model for
Bilingual Cognitive Reserve [78.3500985535601]
We find a surprising connection between multitask learning and robustness to neuron failures.
Our experiments show that bilingual language models retain higher performance under various neuron perturbations.
We provide a theoretical justification for this robustness by mathematically analyzing linear representation learning.
arXiv Detail & Related papers (2022-10-20T22:23:27Z) - MonoByte: A Pool of Monolingual Byte-level Language Models [4.491765479948667]
We release 10 monolingual byte-level models rigorously pretrained under the same configuration.
Because they are tokenizer-free, the problem of unseen token embeddings is eliminated.
Experiments on QA and NLI tasks show that our monolingual models achieve competitive performance to the multilingual one.
arXiv Detail & Related papers (2022-09-22T14:32:48Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z) - Same Neurons, Different Languages: Probing Morphosyntax in Multilingual
Pre-trained Models [84.86942006830772]
We conjecture that multilingual pre-trained models can derive language-universal abstractions about grammar.
We conduct the first large-scale empirical study over 43 languages and 14 morphosyntactic categories with a state-of-the-art neuron-level probe.
arXiv Detail & Related papers (2022-05-04T12:22:31Z) - Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models.
In detail, we first train neural language models with a novel dependency modeling objective.
We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - How Good is Your Tokenizer? On the Monolingual Performance of
Multilingual Language Models [96.32118305166412]
We study a set of nine typologically diverse languages with readily available pretrained monolingual models on a set of five diverse monolingual downstream tasks.
We find that languages which are adequately represented in the multilingual model's vocabulary exhibit negligible performance decreases over their monolingual counterparts.
arXiv Detail & Related papers (2020-12-31T14:11:00Z) - Cross-Linguistic Syntactic Evaluation of Word Prediction Models [25.39896327641704]
We investigate how neural word prediction models' ability to learn syntax varies by language.
CLAMS includes subject-verb agreement challenge sets for English, French, German, Hebrew and Russian.
We use CLAMS to evaluate LSTM language models as well as monolingual and multilingual BERT.
arXiv Detail & Related papers (2020-05-01T02:51:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.