Towards Multi-Sense Cross-Lingual Alignment of Contextual Embeddings
- URL: http://arxiv.org/abs/2103.06459v1
- Date: Thu, 11 Mar 2021 04:55:35 GMT
- Title: Towards Multi-Sense Cross-Lingual Alignment of Contextual Embeddings
- Authors: Linlin Liu, Thien Hai Nguyen, Shafiq Joty, Lidong Bing, Luo Si
- Abstract summary: We propose a novel framework to align contextual embeddings at the sense level by leveraging cross-lingual signal from bilingual dictionaries only.
We operationalize our framework by first proposing a novel sense-aware cross entropy loss to model word senses explicitly.
We then propose a sense alignment objective on top of the sense-aware cross entropy loss for cross-lingual model pretraining, and pretrain cross-lingual models for several language pairs.
- Score: 41.148892848434585
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-lingual word embeddings (CLWE) have been proven useful in many
cross-lingual tasks. However, most existing approaches to learn CLWE including
the ones with contextual embeddings are sense agnostic. In this work, we
propose a novel framework to align contextual embeddings at the sense level by
leveraging cross-lingual signal from bilingual dictionaries only. We
operationalize our framework by first proposing a novel sense-aware cross
entropy loss to model word senses explicitly. The monolingual ELMo and BERT
models pretrained with our sense-aware cross entropy loss demonstrate
significant performance improvement for word sense disambiguation tasks. We
then propose a sense alignment objective on top of the sense-aware cross
entropy loss for cross-lingual model pretraining, and pretrain cross-lingual
models for several language pairs (English to German/Spanish/Japanese/Chinese).
Compared with the best baseline results, our cross-lingual models achieve
0.52%, 2.09% and 1.29% average performance improvements on zero-shot
cross-lingual NER, sentiment classification and XNLI tasks, respectively.
Related papers
- VECO 2.0: Cross-lingual Language Model Pre-training with
Multi-granularity Contrastive Learning [56.47303426167584]
We propose a cross-lingual pre-trained model VECO2.0 based on contrastive learning with multi-granularity alignments.
Specifically, the sequence-to-sequence alignment is induced to maximize the similarity of the parallel pairs and minimize the non-parallel pairs.
token-to-token alignment is integrated to bridge the gap between synonymous tokens excavated via the thesaurus dictionary from the other unpaired tokens in a bilingual instance.
arXiv Detail & Related papers (2023-04-17T12:23:41Z) - A Simple and Effective Method to Improve Zero-Shot Cross-Lingual
Transfer Learning [6.329304732560936]
Existing zero-shot cross-lingual transfer methods rely on parallel corpora or bilingual dictionaries.
We propose Embedding-Push, Attention-Pull, and Robust targets to transfer English embeddings to virtual multilingual embeddings without semantic loss.
arXiv Detail & Related papers (2022-10-18T15:36:53Z) - The Impact of Cross-Lingual Adjustment of Contextual Word
Representations on Zero-Shot Transfer [3.300216758849348]
Large multilingual language models such as mBERT or XLM-R enable zero-shot cross-lingual transfer in various IR and NLP tasks.
We propose a data- and compute-efficient method for cross-lingual adjustment of mBERT that uses a small parallel corpus to make embeddings of related words across languages similar to each other.
We experiment with a typologically diverse set of languages (Spanish, Russian, Vietnamese, and Hindi) and extend their original implementations to new tasks.
Our study reproduced gains in NLI for four languages, showed improved NER, XSR, and cross-lingual QA
arXiv Detail & Related papers (2022-04-13T15:28:43Z) - Bridging the Gap between Language Models and Cross-Lingual Sequence
Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks.
Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages.
In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap.
Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z) - Multi-Level Contrastive Learning for Cross-Lingual Alignment [35.33431650608965]
Cross-language pre-trained models such as multilingual BERT (mBERT) have achieved significant performance in various cross-lingual downstream NLP tasks.
This paper proposes a multi-level contrastive learning framework to further improve the cross-lingual ability of pre-trained models.
arXiv Detail & Related papers (2022-02-26T07:14:20Z) - Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language.
We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models.
Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z) - Cross-lingual Spoken Language Understanding with Regularized
Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource.
Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z) - On Learning Universal Representations Across Languages [37.555675157198145]
We extend existing approaches to learn sentence-level representations and show the effectiveness on cross-lingual understanding and generation.
Specifically, we propose a Hierarchical Contrastive Learning (HiCTL) method to learn universal representations for parallel sentences distributed in one or multiple languages.
We conduct evaluations on two challenging cross-lingual tasks, XTREME and machine translation.
arXiv Detail & Related papers (2020-07-31T10:58:39Z) - InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language
Model Pre-Training [135.12061144759517]
We present an information-theoretic framework that formulates cross-lingual language model pre-training.
We propose a new pre-training task based on contrastive learning.
By leveraging both monolingual and parallel corpora, we jointly train the pretext to improve the cross-lingual transferability of pre-trained models.
arXiv Detail & Related papers (2020-07-15T16:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.