LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon
Induction Through Non-Linear Mapping in Latent Space
- URL: http://arxiv.org/abs/2004.13889v2
- Date: Thu, 22 Oct 2020 00:42:16 GMT
- Title: LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon
Induction Through Non-Linear Mapping in Latent Space
- Authors: Tasnim Mohiuddin, M Saiful Bari, and Shafiq Joty
- Abstract summary: We propose a novel semi-supervised method to learn cross-lingual word embeddings for bilingual lexicon induction.
Our model is independent of the isomorphic assumption and uses nonlinear mapping in the latent space of two independently trained auto-encoders.
- Score: 17.49073364781107
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Most of the successful and predominant methods for bilingual lexicon
induction (BLI) are mapping-based, where a linear mapping function is learned
with the assumption that the word embedding spaces of different languages
exhibit similar geometric structures (i.e., approximately isomorphic). However,
several recent studies have criticized this simplified assumption showing that
it does not hold in general even for closely related languages. In this work,
we propose a novel semi-supervised method to learn cross-lingual word
embeddings for BLI. Our model is independent of the isomorphic assumption and
uses nonlinear mapping in the latent space of two independently trained
auto-encoders. Through extensive experiments on fifteen (15) different language
pairs (in both directions) comprising resource-rich and low-resource languages
from two different datasets, we demonstrate that our method outperforms
existing models by a good margin. Ablation studies show the importance of
different model components and the necessity of non-linear mapping.
Related papers
- Concept Space Alignment in Multilingual LLMs [47.633314194898134]
We show that multilingual models suffer from two familiar weaknesses: generalization works best for languages with similar typology, and for abstract concepts.
For some models, prompt-based embeddings align better than word embeddings, but the projections are less linear.
arXiv Detail & Related papers (2024-10-01T21:21:00Z) - Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing [68.47787275021567]
Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data.
We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between latent variables using Optimal Transport.
arXiv Detail & Related papers (2023-07-09T04:52:31Z) - Isomorphic Cross-lingual Embeddings for Low-Resource Languages [1.5076964620370268]
Cross-Lingual Word Embeddings (CLWEs) are a key component to transfer linguistic information learnt from higher-resource settings into lower-resource ones.
We introduce a framework to learn CLWEs, without assuming isometry, for low-resource pairs via joint exploitation of a related higher-resource language.
We show consistent gains over current methods in both quality and degree of isomorphism, as measured by bilingual lexicon induction (BLI) and eigenvalue similarity respectively.
arXiv Detail & Related papers (2022-03-28T10:39:07Z) - Cross-lingual alignments of ELMo contextual embeddings [0.0]
Cross-lingual embeddings map word embeddings from a low-resource language to a high-resource language.
To produce cross-lingual mappings of recent contextual embeddings, anchor points between the embedding spaces have to be words in the same context.
We propose novel cross-lingual mapping methods for ELMo embeddings.
arXiv Detail & Related papers (2021-06-30T11:26:43Z) - Word Embedding Transformation for Robust Unsupervised Bilingual Lexicon
Induction [21.782189001319935]
We propose a transformation-based method to increase the isomorphism of embeddings of two languages.
Our approach can achieve competitive or superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-05-26T02:09:58Z) - Improving the Lexical Ability of Pretrained Language Models for
Unsupervised Neural Machine Translation [127.81351683335143]
Cross-lingual pretraining requires models to align the lexical- and high-level representations of the two languages.
Previous research has shown that this is because the representations are not sufficiently aligned.
In this paper, we enhance the bilingual masked language model pretraining with lexical-level information by using type-level cross-lingual subword embeddings.
arXiv Detail & Related papers (2021-03-18T21:17:58Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z) - Multi-Adversarial Learning for Cross-Lingual Word Embeddings [19.407717032782863]
We propose a novel method for inducing cross-lingual word embeddings.
It induces the seed cross-lingual dictionary through multiple mappings, each induced to fit the mapping for one subspace.
Our experiments on unsupervised bilingual lexicon induction show that this method improves performance over previous single-mapping methods.
arXiv Detail & Related papers (2020-10-16T14:54:28Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z) - Refinement of Unsupervised Cross-Lingual Word Embeddings [2.4366811507669124]
Cross-lingual word embeddings aim to bridge the gap between high-resource and low-resource languages.
We propose a self-supervised method to refine the alignment of unsupervised bilingual word embeddings.
arXiv Detail & Related papers (2020-02-21T10:39:53Z) - Robust Cross-lingual Embeddings from Parallel Sentences [65.85468628136927]
We propose a bilingual extension of the CBOW method which leverages sentence-aligned corpora to obtain robust cross-lingual word representations.
Our approach significantly improves crosslingual sentence retrieval performance over all other approaches.
It also achieves parity with a deep RNN method on a zero-shot cross-lingual document classification task.
arXiv Detail & Related papers (2019-12-28T16:18:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.