Measuring Cross-Lingual Transferability of Multilingual Transformers on
Sentence Classification
- URL: http://arxiv.org/abs/2305.08800v1
- Date: Mon, 15 May 2023 17:05:45 GMT
- Title: Measuring Cross-Lingual Transferability of Multilingual Transformers on
Sentence Classification
- Authors: Zewen Chi, Heyan Huang, Xian-Ling Mao
- Abstract summary: We propose IGap, a cross-lingual transferability metric for multilingual Transformers on sentence classification tasks.
Experimental results show that IGap outperforms baseline metrics for transferability measuring and transfer direction ranking.
Our results reveal three findings about cross-lingual transfer, which helps us to better understand multilingual Transformers.
- Score: 49.8111760092473
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies have exhibited remarkable capabilities of pre-trained
multilingual Transformers, especially cross-lingual transferability. However,
current methods do not measure cross-lingual transferability well, hindering
the understanding of multilingual Transformers. In this paper, we propose IGap,
a cross-lingual transferability metric for multilingual Transformers on
sentence classification tasks. IGap takes training error into consideration,
and can also estimate transferability without end-task data. Experimental
results show that IGap outperforms baseline metrics for transferability
measuring and transfer direction ranking. Besides, we conduct extensive
systematic experiments where we compare transferability among various
multilingual Transformers, fine-tuning algorithms, and transfer directions.
More importantly, our results reveal three findings about cross-lingual
transfer, which helps us to better understand multilingual Transformers.
Related papers
- Disentangling the Roles of Target-Side Transfer and Regularization in
Multilingual Machine Translation [9.838281446902268]
We conduct a large-scale study that varies the auxiliary target side languages along two dimensions.
We show that linguistically similar target languages exhibit strong ability to transfer positive knowledge.
With an increasing size of similar target languages, the positive transfer is further enhanced to benefit the main language pairs.
Meanwhile, distant auxiliary target languages can also unexpectedly benefit main language pairs, even with minimal positive transfer ability.
arXiv Detail & Related papers (2024-02-01T10:55:03Z) - Viewing Knowledge Transfer in Multilingual Machine Translation Through a
Representational Lens [15.283483438956264]
We introduce Representational Transfer Potential (RTP), which measures representational similarities between languages.
We show that RTP can measure both positive and negative transfer (interference), and find that RTP is strongly correlated with changes in translation quality.
We develop a novel training scheme, which uses an auxiliary similarity loss that encourages representations to be more invariant across languages.
arXiv Detail & Related papers (2023-05-19T09:36:48Z) - UTSGAN: Unseen Transition Suss GAN for Transition-Aware Image-to-image
Translation [57.99923293611923]
We introduce a transition-aware approach to I2I translation, where the data translation mapping is explicitly parameterized with a transition variable.
We propose the use of transition consistency, defined on the transition variable, to enable regularization of consistency on unobserved translations.
Based on these insights, we present Unseen Transition Suss GAN (UTSGAN), a generative framework that constructs a manifold for the transition with a transition encoder.
arXiv Detail & Related papers (2023-04-24T09:47:34Z) - Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual
Retrieval [66.69799641522133]
State-of-the-art neural (re)rankers are notoriously data hungry.
Current approaches typically transfer rankers trained on English data to other languages and cross-lingual setups by means of multilingual encoders.
We show that two parameter-efficient approaches to cross-lingual transfer, namely Sparse Fine-Tuning Masks (SFTMs) and Adapters, allow for a more lightweight and more effective zero-shot transfer.
arXiv Detail & Related papers (2022-04-05T15:44:27Z) - Scalable Transformers for Neural Machine Translation [86.4530299266897]
Transformer has been widely adopted in Neural Machine Translation (NMT) because of its large capacity and parallel training of sequence generation.
We propose a novel scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters.
A three-stage training scheme is proposed to tackle the difficulty of training the scalable Transformers.
arXiv Detail & Related papers (2021-06-04T04:04:10Z) - Orthogonal Language and Task Adapters in Zero-Shot Cross-Lingual
Transfer [43.92142759245696]
orthoadapters are trained to encode language- and task-specific information that is complementary to the knowledge already stored in the pretrained transformer's parameters.
Our zero-shot cross-lingual transfer experiments, involving three tasks (POS-tagging, NER, NLI) and a set of 10 diverse languages, 1) point to the usefulness of orthoadapters in cross-lingual transfer, especially for the most complex NLI task, but also 2) indicate that the optimal adapter configuration highly depends on the task and the target language.
arXiv Detail & Related papers (2020-12-11T16:32:41Z) - Deep Transformers with Latent Depth [42.33955275626127]
The Transformer model has achieved state-of-the-art performance in many sequence modeling tasks.
We present a probabilistic framework to automatically learn which layer(s) to use by learning the posterior distributions of layer selection.
We propose a novel method to train one shared Transformer network for multilingual machine translation.
arXiv Detail & Related papers (2020-09-28T07:13:23Z) - From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual
Transfer with Multilingual Transformers [62.637055980148816]
Massively multilingual transformers pretrained with language modeling objectives have become a de facto default transfer paradigm for NLP.
We show that cross-lingual transfer via massively multilingual transformers is substantially less effective in resource-lean scenarios and for distant languages.
arXiv Detail & Related papers (2020-05-01T22:04:58Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.