DiTTO: A Feature Representation Imitation Approach for Improving
Cross-Lingual Transfer
- URL: http://arxiv.org/abs/2303.02357v1
- Date: Sat, 4 Mar 2023 08:42:50 GMT
- Title: DiTTO: A Feature Representation Imitation Approach for Improving
Cross-Lingual Transfer
- Authors: Shanu Kumar, Abbaraju Soujanya, Sandipan Dandapat, Sunayana Sitaram,
Monojit Choudhury
- Abstract summary: languages as domains for improving zero-shot transfer.
We show that our approach, DiTTO, significantly outperforms the standard zero-shot fine-tuning method.
Our model enables better cross-lingual transfer than standard fine-tuning methods, even in the few-shot setting.
- Score: 15.062937537799005
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Zero-shot cross-lingual transfer is promising, however has been shown to be
sub-optimal, with inferior transfer performance across low-resource languages.
In this work, we envision languages as domains for improving zero-shot transfer
by jointly reducing the feature incongruity between the source and the target
language and increasing the generalization capabilities of pre-trained
multilingual transformers. We show that our approach, DiTTO, significantly
outperforms the standard zero-shot fine-tuning method on multiple datasets
across all languages using solely unlabeled instances in the target language.
Empirical results show that jointly reducing feature incongruity for multiple
target languages is vital for successful cross-lingual transfer. Moreover, our
model enables better cross-lingual transfer than standard fine-tuning methods,
even in the few-shot setting.
Related papers
- Self-Augmentation Improves Zero-Shot Cross-Lingual Transfer [92.80671770992572]
Cross-lingual transfer is a central task in multilingual NLP.
Earlier efforts on this task use parallel corpora, bilingual dictionaries, or other annotated alignment data.
We propose a simple yet effective method, SALT, to improve the zero-shot cross-lingual transfer.
arXiv Detail & Related papers (2023-09-19T19:30:56Z) - Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing [68.47787275021567]
Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data.
We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between latent variables using Optimal Transport.
arXiv Detail & Related papers (2023-07-09T04:52:31Z) - Model and Data Transfer for Cross-Lingual Sequence Labelling in
Zero-Resource Settings [10.871587311621974]
We experimentally demonstrate that high capacity multilingual language models applied in a zero-shot setting consistently outperform data-based cross-lingual transfer approaches.
A detailed analysis of our results suggests that this might be due to important differences in language use.
Our results also indicate that data-based cross-lingual transfer approaches remain a competitive option when high-capacity multilingual language models are not available.
arXiv Detail & Related papers (2022-10-23T05:37:35Z) - A Simple and Effective Method to Improve Zero-Shot Cross-Lingual
Transfer Learning [6.329304732560936]
Existing zero-shot cross-lingual transfer methods rely on parallel corpora or bilingual dictionaries.
We propose Embedding-Push, Attention-Pull, and Robust targets to transfer English embeddings to virtual multilingual embeddings without semantic loss.
arXiv Detail & Related papers (2022-10-18T15:36:53Z) - Zero-shot Cross-lingual Transfer is Under-specified Optimization [49.3779328255767]
We show that any linear-interpolated model between the source language monolingual model and source + target bilingual model has equally low source language generalization error.
We also show that zero-shot solution lies in non-flat region of target language error generalization surface, causing the high variance.
arXiv Detail & Related papers (2022-07-12T16:49:28Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z) - From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual
Transfer with Multilingual Transformers [62.637055980148816]
Massively multilingual transformers pretrained with language modeling objectives have become a de facto default transfer paradigm for NLP.
We show that cross-lingual transfer via massively multilingual transformers is substantially less effective in resource-lean scenarios and for distant languages.
arXiv Detail & Related papers (2020-05-01T22:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.