Robust Unsupervised Cross-Lingual Word Embedding using Domain Flow
Interpolation
- URL: http://arxiv.org/abs/2210.03319v1
- Date: Fri, 7 Oct 2022 04:37:47 GMT
- Title: Robust Unsupervised Cross-Lingual Word Embedding using Domain Flow
Interpolation
- Authors: Liping Tang, Zhen Li, Zhiquan Luo, Helen Meng
- Abstract summary: Previous adversarial approaches have shown promising results in inducing cross-lingual word embedding without parallel data.
We propose to make use of a sequence of intermediate spaces for smooth bridging.
- Score: 48.32604585839687
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates an unsupervised approach towards deriving a
universal, cross-lingual word embedding space, where words with similar
semantics from different languages are close to one another. Previous
adversarial approaches have shown promising results in inducing cross-lingual
word embedding without parallel data. However, the training stage shows
instability for distant language pairs. Instead of mapping the source language
space directly to the target language space, we propose to make use of a
sequence of intermediate spaces for smooth bridging. Each intermediate space
may be conceived as a pseudo-language space and is introduced via simple linear
interpolation. This approach is modeled after domain flow in computer vision,
but with a modified objective function. Experiments on intrinsic Bilingual
Dictionary Induction tasks show that the proposed approach can improve the
robustness of adversarial models with comparable and even better precision.
Further experiments on the downstream task of Cross-Lingual Natural Language
Inference show that the proposed model achieves significant performance
improvement for distant language pairs in downstream tasks compared to
state-of-the-art adversarial and non-adversarial models.
Related papers
- Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing [68.47787275021567]
Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data.
We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between latent variables using Optimal Transport.
arXiv Detail & Related papers (2023-07-09T04:52:31Z) - VECO 2.0: Cross-lingual Language Model Pre-training with
Multi-granularity Contrastive Learning [56.47303426167584]
We propose a cross-lingual pre-trained model VECO2.0 based on contrastive learning with multi-granularity alignments.
Specifically, the sequence-to-sequence alignment is induced to maximize the similarity of the parallel pairs and minimize the non-parallel pairs.
token-to-token alignment is integrated to bridge the gap between synonymous tokens excavated via the thesaurus dictionary from the other unpaired tokens in a bilingual instance.
arXiv Detail & Related papers (2023-04-17T12:23:41Z) - Cross-Align: Modeling Deep Cross-lingual Interactions for Word Alignment [63.0407314271459]
The proposed Cross-Align achieves the state-of-the-art (SOTA) performance on four out of five language pairs.
Experiments show that the proposed Cross-Align achieves the state-of-the-art (SOTA) performance on four out of five language pairs.
arXiv Detail & Related papers (2022-10-09T02:24:35Z) - Unsupervised Alignment of Distributional Word Embeddings [0.0]
Cross-domain alignment play a key role in tasks ranging from machine translation to transfer learning.
We show that the proposed approach achieves good performance on the bilingual lexicon induction task across several language pairs.
arXiv Detail & Related papers (2022-03-09T16:39:06Z) - Refinement of Unsupervised Cross-Lingual Word Embeddings [2.4366811507669124]
Cross-lingual word embeddings aim to bridge the gap between high-resource and low-resource languages.
We propose a self-supervised method to refine the alignment of unsupervised bilingual word embeddings.
arXiv Detail & Related papers (2020-02-21T10:39:53Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z) - Robust Cross-lingual Embeddings from Parallel Sentences [65.85468628136927]
We propose a bilingual extension of the CBOW method which leverages sentence-aligned corpora to obtain robust cross-lingual word representations.
Our approach significantly improves crosslingual sentence retrieval performance over all other approaches.
It also achieves parity with a deep RNN method on a zero-shot cross-lingual document classification task.
arXiv Detail & Related papers (2019-12-28T16:18:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.