Related papers: Multi-Adversarial Learning for Cross-Lingual Word Embeddings

Multi-Adversarial Learning for Cross-Lingual Word Embeddings

URL: http://arxiv.org/abs/2010.08432v2
Date: Wed, 25 Aug 2021 22:11:48 GMT
Title: Multi-Adversarial Learning for Cross-Lingual Word Embeddings
Authors: Haozhou Wang, James Henderson, Paola Merlo
Abstract summary: We propose a novel method for inducing cross-lingual word embeddings. It induces the seed cross-lingual dictionary through multiple mappings, each induced to fit the mapping for one subspace. Our experiments on unsupervised bilingual lexicon induction show that this method improves performance over previous single-mapping methods.
Score: 19.407717032782863
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative adversarial networks (GANs) have succeeded in inducing cross-lingual word embeddings -- maps of matching words across languages -- without supervision. Despite these successes, GANs' performance for the difficult case of distant languages is still not satisfactory. These limitations have been explained by GANs' incorrect assumption that source and target embedding spaces are related by a single linear mapping and are approximately isomorphic. We assume instead that, especially across distant languages, the mapping is only piece-wise linear, and propose a multi-adversarial learning method. This novel method induces the seed cross-lingual dictionary through multiple mappings, each induced to fit the mapping for one subspace. Our experiments on unsupervised bilingual lexicon induction show that this method improves performance over previous single-mapping methods, especially for distant languages.

Related papers

Robust Unsupervised Cross-Lingual Word Embedding using Domain Flow Interpolation [48.32604585839687]
Previous adversarial approaches have shown promising results in inducing cross-lingual word embedding without parallel data. We propose to make use of a sequence of intermediate spaces for smooth bridging.
arXiv Detail & Related papers (2022-10-07T04:37:47Z)
Word Embedding Transformation for Robust Unsupervised Bilingual Lexicon Induction [21.782189001319935]
We propose a transformation-based method to increase the isomorphism of embeddings of two languages. Our approach can achieve competitive or superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-05-26T02:09:58Z)
Beyond Offline Mapping: Learning Cross Lingual Word Embeddings through Context Anchoring [41.77270308094212]
We propose an alternative mapping approach for word embeddings in languages other than English. Rather than aligning two fixed embedding spaces, our method works by fixing the target language embeddings, and learning a new set of embeddings for the source language that are aligned with them. Our approach outperforms conventional mapping methods on bilingual lexicon induction, and obtains competitive results in the downstream XNLI task.
arXiv Detail & Related papers (2020-12-31T17:10:14Z)
Inducing Language-Agnostic Multilingual Representations [61.97381112847459]
Cross-lingual representations have the potential to make NLP techniques available to the vast majority of languages in the world. We examine three approaches for this: (i) re-aligning the vector spaces of target languages to a pivot source language; (ii) removing language-specific means and variances, which yields better discriminativeness of embeddings as a by-product; and (iii) increasing input similarity across languages by removing morphological contractions and sentence reordering.
arXiv Detail & Related papers (2020-08-20T17:58:56Z)
LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction Through Non-Linear Mapping in Latent Space [17.49073364781107]
We propose a novel semi-supervised method to learn cross-lingual word embeddings for bilingual lexicon induction. Our model is independent of the isomorphic assumption and uses nonlinear mapping in the latent space of two independently trained auto-encoders.
arXiv Detail & Related papers (2020-04-28T23:28:26Z)
On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics. Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings. We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z)
Refinement of Unsupervised Cross-Lingual Word Embeddings [2.4366811507669124]
Cross-lingual word embeddings aim to bridge the gap between high-resource and low-resource languages. We propose a self-supervised method to refine the alignment of unsupervised bilingual word embeddings.
arXiv Detail & Related papers (2020-02-21T10:39:53Z)
ABSent: Cross-Lingual Sentence Representation Mapping with Bidirectional GANs [48.287610663358066]
We propose an Adversarial Bi-directional Sentence Embedding Mapping (ABSent) framework, which learns mappings of cross-lingual sentence representations from limited quantities of parallel data.
arXiv Detail & Related papers (2020-01-29T22:44:05Z)
Robust Cross-lingual Embeddings from Parallel Sentences [65.85468628136927]
We propose a bilingual extension of the CBOW method which leverages sentence-aligned corpora to obtain robust cross-lingual word representations. Our approach significantly improves crosslingual sentence retrieval performance over all other approaches. It also achieves parity with a deep RNN method on a zero-shot cross-lingual document classification task.
arXiv Detail & Related papers (2019-12-28T16:18:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.