Multitasking Models are Robust to Structural Failure: A Neural Model for
Bilingual Cognitive Reserve
- URL: http://arxiv.org/abs/2210.11618v1
- Date: Thu, 20 Oct 2022 22:23:27 GMT
- Title: Multitasking Models are Robust to Structural Failure: A Neural Model for
Bilingual Cognitive Reserve
- Authors: Giannis Daras, Negin Raoof, Zoi Gkalitsiou, Alexandros G. Dimakis
- Abstract summary: We find a surprising connection between multitask learning and robustness to neuron failures.
Our experiments show that bilingual language models retain higher performance under various neuron perturbations.
We provide a theoretical justification for this robustness by mathematically analyzing linear representation learning.
- Score: 78.3500985535601
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We find a surprising connection between multitask learning and robustness to
neuron failures. Our experiments show that bilingual language models retain
higher performance under various neuron perturbations, such as random
deletions, magnitude pruning and weight noise compared to equivalent
monolingual ones. We provide a theoretical justification for this robustness by
mathematically analyzing linear representation learning and showing that
multitasking creates more robust representations. Our analysis connects
robustness to spectral properties of the learned representation and proves that
multitasking leads to higher robustness for diverse task vectors. We
open-source our code and models:
https://github.com/giannisdaras/multilingual_robustness
Related papers
- LOLA -- An Open-Source Massively Multilingual Large Language Model [1.5704590739448838]
LOLA is a massively multilingual large language model trained on more than 160 languages.
Our architectural and implementation choices address the challenge of harnessing linguistic diversity.
We show how the learned expert-routing mechanism exploits implicit phylogenetic patterns to potentially alleviate the curse of multilinguality.
arXiv Detail & Related papers (2024-09-17T15:23:08Z) - FonMTL: Towards Multitask Learning for the Fon Language [1.9370453715137865]
We present the first explorative approach to multitask learning, for model capabilities enhancement in Natural Language Processing for the Fon language.
We leverage two language model heads as encoders to build shared representations for the inputs, and we use linear layers blocks for classification relative to each task.
Our results on the NER and POS tasks for Fon, show competitive (or better) performances compared to several multilingual pretrained language models finetuned on single tasks.
arXiv Detail & Related papers (2023-08-28T03:26:21Z) - Learning an Artificial Language for Knowledge-Sharing in Multilingual
Translation [15.32063273544696]
We discretize the latent space of multilingual models by assigning encoder states to entries in a codebook.
We validate our approach on large-scale experiments with realistic data volumes and domains.
We also use the learned artificial language to analyze model behavior, and discover that using a similar bridge language increases knowledge-sharing among the remaining languages.
arXiv Detail & Related papers (2022-11-02T17:14:42Z) - Causal Analysis of Syntactic Agreement Neurons in Multilingual Language
Models [28.036233760742125]
We causally probe multilingual language models (XGLM and multilingual BERT) across various languages.
We find significant neuron overlap across languages in autoregressive multilingual language models, but not masked language models.
arXiv Detail & Related papers (2022-10-25T20:43:36Z) - How Robust is Neural Machine Translation to Language Imbalance in
Multilingual Tokenizer Training? [86.48323488619629]
We analyze how translation performance changes as the data ratios among languages vary in the tokenizer training corpus.
We find that while relatively better performance is often observed when languages are more equally sampled, the downstream performance is more robust to language imbalance than we usually expected.
arXiv Detail & Related papers (2022-04-29T17:50:36Z) - Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models.
In detail, we first train neural language models with a novel dependency modeling objective.
We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Multi-task Learning of Negation and Speculation for Targeted Sentiment
Classification [15.85111852764517]
We show that targeted sentiment models are not robust to linguistic phenomena, specifically negation and speculation.
We propose a multi-task learning method to incorporate information from syntactic and semantic auxiliary tasks, including negation and speculation scope detection.
We create two challenge datasets to evaluate model performance on negated and speculative samples.
arXiv Detail & Related papers (2020-10-16T11:20:03Z) - Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long.
We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay.
Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.