On the Language-specificity of Multilingual BERT and the Impact of
Fine-tuning
- URL: http://arxiv.org/abs/2109.06935v1
- Date: Tue, 14 Sep 2021 19:28:31 GMT
- Title: On the Language-specificity of Multilingual BERT and the Impact of
Fine-tuning
- Authors: Marc Tanti and Lonneke van der Plas and Claudia Borg and Albert Gatt
- Abstract summary: The knowledge acquired by multilingual BERT (mBERT) has two components: a language-specific and a language-neutral one.
This paper analyses the relationship between them, in the context of fine-tuning on two tasks.
- Score: 7.493779672689531
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has shown evidence that the knowledge acquired by multilingual
BERT (mBERT) has two components: a language-specific and a language-neutral
one. This paper analyses the relationship between them, in the context of
fine-tuning on two tasks -- POS tagging and natural language inference -- which
require the model to bring to bear different degrees of language-specific
knowledge. Visualisations reveal that mBERT loses the ability to cluster
representations by language after fine-tuning, a result that is supported by
evidence from language identification experiments. However, further experiments
on 'unlearning' language-specific representations using gradient reversal and
iterative adversarial learning are shown not to add further improvement to the
language-independent component over and above the effect of fine-tuning. The
results presented here suggest that the process of fine-tuning causes a
reorganisation of the model's limited representational capacity, enhancing
language-independent representations at the expense of language-specific ones.
Related papers
- ShifCon: Enhancing Non-Dominant Language Capabilities with a Shift-based Contrastive Framework [79.72910257530795]
ShifCon is a Shift-based Contrastive framework that aligns the internal forward process of other languages toward that of the dominant one.
It shifts the representations of non-dominant languages into the dominant language subspace, allowing them to access relatively rich information encoded in the model parameters.
Experiments demonstrate that our ShifCon framework significantly enhances the performance of non-dominant languages.
arXiv Detail & Related papers (2024-10-25T10:28:59Z) - Investigating Language-Specific Calibration For Pruning Multilingual Large Language Models [11.421452042888523]
We compare different calibration languages for pruning multilingual models across diverse languages, tasks, models, and SotA pruning techniques.
Our results offer practical suggestions, for example, calibrating in the target language can efficiently retain the language modeling capability but does not necessarily benefit downstream tasks.
arXiv Detail & Related papers (2024-08-26T16:29:13Z) - Expand BERT Representation with Visual Information via Grounded Language
Learning with Multimodal Partial Alignment [11.148099070407431]
GroundedBERT is a grounded language learning method that enhances the BERT representation with visually grounded information.
Our proposed method significantly outperforms the baseline language models on various language tasks of the GLUE and SQuAD datasets.
arXiv Detail & Related papers (2023-12-04T03:16:48Z) - Is Prompt-Based Finetuning Always Better than Vanilla Finetuning?
Insights from Cross-Lingual Language Understanding [0.30586855806896046]
We propose the ProFiT pipeline to investigate the cross-lingual capabilities of Prompt-based Finetuning.
Our results reveal the effectiveness and versatility of prompt-based finetuning in cross-lingual language understanding.
arXiv Detail & Related papers (2023-07-15T20:33:33Z) - Relationship of the language distance to English ability of a country [0.0]
We introduce a novel solution to measure the semantic dissimilarity between languages.
We empirically examine the effectiveness of the proposed semantic language distance.
The experimental results show that the language distance demonstrates negative influence on a country's average English ability.
arXiv Detail & Related papers (2022-11-15T02:40:00Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Cross-Lingual Ability of Multilingual Masked Language Models: A Study of
Language Structure [54.01613740115601]
We study three language properties: constituent order, composition and word co-occurrence.
Our main conclusion is that the contribution of constituent order and word co-occurrence is limited, while the composition is more crucial to the success of cross-linguistic transfer.
arXiv Detail & Related papers (2022-03-16T07:09:35Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Are Multilingual Models Effective in Code-Switching? [57.78477547424949]
We study the effectiveness of multilingual language models to understand their capability and adaptability to the mixed-language setting.
Our findings suggest that pre-trained multilingual models do not necessarily guarantee high-quality representations on code-switching.
arXiv Detail & Related papers (2021-03-24T16:20:02Z) - On Negative Interference in Multilingual Models: Findings and A
Meta-Learning Treatment [59.995385574274785]
We show that, contrary to previous belief, negative interference also impacts low-resource languages.
We present a meta-learning algorithm that obtains better cross-lingual transferability and alleviates negative interference.
arXiv Detail & Related papers (2020-10-06T20:48:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.