Multilingual Alignment of Contextual Word Representations
- URL: http://arxiv.org/abs/2002.03518v2
- Date: Wed, 12 Feb 2020 23:28:06 GMT
- Title: Multilingual Alignment of Contextual Word Representations
- Authors: Steven Cao, Nikita Kitaev, Dan Klein
- Abstract summary: BERT exhibits significantly improved zero-shot performance on XNLI compared to the base model.
We introduce a contextual version of word retrieval and show that it correlates well with downstream zero-shot transfer.
These results support contextual alignment as a useful concept for understanding large multilingual pre-trained models.
- Score: 49.42244463346612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose procedures for evaluating and strengthening contextual embedding
alignment and show that they are useful in analyzing and improving multilingual
BERT. In particular, after our proposed alignment procedure, BERT exhibits
significantly improved zero-shot performance on XNLI compared to the base
model, remarkably matching pseudo-fully-supervised translate-train models for
Bulgarian and Greek. Further, to measure the degree of alignment, we introduce
a contextual version of word retrieval and show that it correlates well with
downstream zero-shot transfer. Using this word retrieval task, we also analyze
BERT and find that it exhibits systematic deficiencies, e.g. worse alignment
for open-class parts-of-speech and word pairs written in different scripts,
that are corrected by the alignment procedure. These results support contextual
alignment as a useful concept for understanding large multilingual pre-trained
models.
Related papers
- Decomposed Prompting for Machine Translation Between Related Languages
using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations.
We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Utilizing Language-Image Pretraining for Efficient and Robust Bilingual
Word Alignment [27.405171616881322]
We develop a novel UWT method dubbed Word Alignment using Language-Image Pretraining (WALIP)
WALIP uses visual observations via the shared embedding space of images and texts provided by CLIP models.
Our experiments show that WALIP improves upon the state-of-the-art performance of bilingual word alignment for a few language pairs.
arXiv Detail & Related papers (2022-05-23T20:29:26Z) - Improving Contextual Representation with Gloss Regularized Pre-training [9.589252392388758]
We propose an auxiliary gloss regularizer module to BERT pre-training (GR-BERT) to enhance word semantic similarity.
By predicting masked words and aligning contextual embeddings to corresponding glosses simultaneously, the word similarity can be explicitly modeled.
Experimental results show that the gloss regularizer benefits BERT in word-level and sentence-level semantic representation.
arXiv Detail & Related papers (2022-05-13T12:50:32Z) - SLUA: A Super Lightweight Unsupervised Word Alignment Model via
Cross-Lingual Contrastive Learning [79.91678610678885]
We propose a super lightweight unsupervised word alignment model (SLUA)
Experimental results on several public benchmarks demonstrate that our model achieves competitive, if not better, performance.
Notably, we recognize our model as a pioneer attempt to unify bilingual word embedding and word alignments.
arXiv Detail & Related papers (2021-02-08T05:54:11Z) - Word Alignment by Fine-tuning Embeddings on Parallel Corpora [96.28608163701055]
Word alignment over parallel corpora has a wide variety of applications, including learning translation lexicons, cross-lingual transfer of language processing tools, and automatic evaluation or analysis of translation outputs.
Recently, other work has demonstrated that pre-trained contextualized word embeddings derived from multilingually trained language models (LMs) prove an attractive alternative, achieving competitive results on the word alignment task even in the absence of explicit training on parallel data.
In this paper, we examine methods to marry the two approaches: leveraging pre-trained LMs but fine-tuning them on parallel text with objectives designed to improve alignment quality, and proposing
arXiv Detail & Related papers (2021-01-20T17:54:47Z) - Unsupervised Word Translation Pairing using Refinement based Point Set
Registration [8.568050813210823]
Cross-lingual alignment of word embeddings play an important role in knowledge transfer across languages.
Current unsupervised approaches rely on similarities in geometric structure of word embedding spaces across languages.
This paper proposes BioSpere, a novel framework for unsupervised mapping of bi-lingual word embeddings onto a shared vector space.
arXiv Detail & Related papers (2020-11-26T09:51:29Z) - Cross-lingual Alignment Methods for Multilingual BERT: A Comparative
Study [2.101267270902429]
We analyse how different forms of cross-lingual supervision and various alignment methods influence the transfer capability of mBERT in zero-shot setting.
We find that supervision from parallel corpus is generally superior to dictionary alignments.
arXiv Detail & Related papers (2020-09-29T20:56:57Z) - Self-Attention with Cross-Lingual Position Representation [112.05807284056337]
Position encoding (PE) is used to preserve the word order information for natural language processing tasks, generating fixed position indices for input sequences.
Due to word order divergences in different languages, modeling the cross-lingual positional relationships might help SANs tackle this problem.
We augment SANs with emphcross-lingual position representations to model the bilingually aware latent structure for the input sentence.
arXiv Detail & Related papers (2020-04-28T05:23:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.