Related papers: On the Language-specificity of Multilingual BERT and the Impact of Fine-tuning

On the Language-specificity of Multilingual BERT and the Impact of Fine-tuning

URL: http://arxiv.org/abs/2109.06935v1
Date: Tue, 14 Sep 2021 19:28:31 GMT
Title: On the Language-specificity of Multilingual BERT and the Impact of Fine-tuning
Authors: Marc Tanti and Lonneke van der Plas and Claudia Borg and Albert Gatt
Abstract summary: The knowledge acquired by multilingual BERT (mBERT) has two components: a language-specific and a language-neutral one. This paper analyses the relationship between them, in the context of fine-tuning on two tasks.
Score: 7.493779672689531
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent work has shown evidence that the knowledge acquired by multilingual BERT (mBERT) has two components: a language-specific and a language-neutral one. This paper analyses the relationship between them, in the context of fine-tuning on two tasks -- POS tagging and natural language inference -- which require the model to bring to bear different degrees of language-specific knowledge. Visualisations reveal that mBERT loses the ability to cluster representations by language after fine-tuning, a result that is supported by evidence from language identification experiments. However, further experiments on 'unlearning' language-specific representations using gradient reversal and iterative adversarial learning are shown not to add further improvement to the language-independent component over and above the effect of fine-tuning. The results presented here suggest that the process of fine-tuning causes a reorganisation of the model's limited representational capacity, enhancing language-independent representations at the expense of language-specific ones.

Related papers

High-Dimensional Interlingual Representations of Large Language Models [65.77317753001954]
Large language models (LLMs) trained on massive multilingual datasets hint at the formation of interlingual constructs. We explore 31 diverse languages varying on their resource-levels, typologies, and geographical regions. We find that multilingual LLMs exhibit inconsistent cross-lingual alignments.
arXiv Detail & Related papers (2025-03-14T10:39:27Z)
ShifCon: Enhancing Non-Dominant Language Capabilities with a Shift-based Contrastive Framework [79.72910257530795]
ShifCon is a Shift-based Contrastive framework that aligns the internal forward process of other languages toward that of the dominant one. It shifts the representations of non-dominant languages into the dominant language subspace, allowing them to access relatively rich information encoded in the model parameters. Experiments demonstrate that our ShifCon framework significantly enhances the performance of non-dominant languages.
arXiv Detail & Related papers (2024-10-25T10:28:59Z)
LLM-based Translation Inference with Iterative Bilingual Understanding [52.46978502902928]
We propose a novel Iterative Bilingual Understanding Translation method based on the cross-lingual capabilities of large language models (LLMs) The cross-lingual capability of LLMs enables the generation of contextual understanding for both the source and target languages separately. The proposed IBUT outperforms several strong comparison methods.
arXiv Detail & Related papers (2024-10-16T13:21:46Z)
Investigating Language-Specific Calibration For Pruning Multilingual Large Language Models [11.421452042888523]
We compare different calibration languages for pruning multilingual models across diverse languages, tasks, models, and SotA pruning techniques. Our results offer practical suggestions, for example, calibrating in the target language can efficiently retain the language modeling capability but does not necessarily benefit downstream tasks.
arXiv Detail & Related papers (2024-08-26T16:29:13Z)
Expand BERT Representation with Visual Information via Grounded Language Learning with Multimodal Partial Alignment [11.148099070407431]
GroundedBERT is a grounded language learning method that enhances the BERT representation with visually grounded information. Our proposed method significantly outperforms the baseline language models on various language tasks of the GLUE and SQuAD datasets.
arXiv Detail & Related papers (2023-12-04T03:16:48Z)
Is Prompt-Based Finetuning Always Better than Vanilla Finetuning? Insights from Cross-Lingual Language Understanding [0.30586855806896046]
We propose the ProFiT pipeline to investigate the cross-lingual capabilities of Prompt-based Finetuning. Our results reveal the effectiveness and versatility of prompt-based finetuning in cross-lingual language understanding.
arXiv Detail & Related papers (2023-07-15T20:33:33Z)
Relationship of the language distance to English ability of a country [0.0]
We introduce a novel solution to measure the semantic dissimilarity between languages. We empirically examine the effectiveness of the proposed semantic language distance. The experimental results show that the language distance demonstrates negative influence on a country's average English ability.
arXiv Detail & Related papers (2022-11-15T02:40:00Z)
Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions. Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z)
Cross-Lingual Ability of Multilingual Masked Language Models: A Study of Language Structure [54.01613740115601]
We study three language properties: constituent order, composition and word co-occurrence. Our main conclusion is that the contribution of constituent order and word co-occurrence is limited, while the composition is more crucial to the success of cross-linguistic transfer.
arXiv Detail & Related papers (2022-03-16T07:09:35Z)
AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context. It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts. Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z)
Are Multilingual Models Effective in Code-Switching? [57.78477547424949]
We study the effectiveness of multilingual language models to understand their capability and adaptability to the mixed-language setting. Our findings suggest that pre-trained multilingual models do not necessarily guarantee high-quality representations on code-switching.
arXiv Detail & Related papers (2021-03-24T16:20:02Z)
On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment [59.995385574274785]
We show that, contrary to previous belief, negative interference also impacts low-resource languages. We present a meta-learning algorithm that obtains better cross-lingual transferability and alleviates negative interference.
arXiv Detail & Related papers (2020-10-06T20:48:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.