Revisiting the Context Window for Cross-lingual Word Embeddings
- URL: http://arxiv.org/abs/2004.10813v1
- Date: Wed, 22 Apr 2020 19:29:43 GMT
- Title: Revisiting the Context Window for Cross-lingual Word Embeddings
- Authors: Ryokan Ri and Yoshimasa Tsuruoka
- Abstract summary: Existing approaches to mapping-based cross-lingual word embeddings are based on the assumption that the source and target embedding spaces are structurally similar.
This work provides a thorough evaluation, in various languages, domains, and tasks, of bilingual embeddings trained with different context windows.
The highlight of our findings is that increasing the size of both the source and target window sizes improves the performance of bilingual lexicon induction.
- Score: 32.27333420000134
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing approaches to mapping-based cross-lingual word embeddings are based
on the assumption that the source and target embedding spaces are structurally
similar. The structures of embedding spaces largely depend on the co-occurrence
statistics of each word, which the choice of context window determines. Despite
this obvious connection between the context window and mapping-based
cross-lingual embeddings, their relationship has been underexplored in prior
work. In this work, we provide a thorough evaluation, in various languages,
domains, and tasks, of bilingual embeddings trained with different context
windows. The highlight of our findings is that increasing the size of both the
source and target window sizes improves the performance of bilingual lexicon
induction, especially the performance on frequent nouns.
Related papers
- Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Task-Specific Dependency-based Word Embedding Methods [32.75244210656976]
Two task-specific dependency-based word embedding methods are proposed for text classification.
The first one, called the dependency-based word embedding (DWE), chooses keywords and neighbor words of a target word in the dependency parse tree as contexts to build the word-context matrix.
The second method, named class-enhanced dependency-based word embedding (CEDWE), learns from word-context as well as word-class co-occurrence statistics.
arXiv Detail & Related papers (2021-10-26T03:09:41Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Beyond Offline Mapping: Learning Cross Lingual Word Embeddings through
Context Anchoring [41.77270308094212]
We propose an alternative mapping approach for word embeddings in languages other than English.
Rather than aligning two fixed embedding spaces, our method works by fixing the target language embeddings, and learning a new set of embeddings for the source language that are aligned with them.
Our approach outperforms conventional mapping methods on bilingual lexicon induction, and obtains competitive results in the downstream XNLI task.
arXiv Detail & Related papers (2020-12-31T17:10:14Z) - What makes multilingual BERT multilingual? [60.9051207862378]
In this work, we provide an in-depth experimental study to supplement the existing literature of cross-lingual ability.
We compare the cross-lingual ability of non-contextualized and contextualized representation model with the same data.
We found that datasize and context window size are crucial factors to the transferability.
arXiv Detail & Related papers (2020-10-20T05:41:56Z) - A Survey on Contextual Embeddings [48.04732268018772]
Contextual embeddings assign each word a representation based on its context, capturing uses of words across varied contexts and encoding knowledge that transfers across languages.
We review existing contextual embedding models, cross-lingual polyglot pre-training, the application of contextual embeddings in downstream tasks, model compression, and model analyses.
arXiv Detail & Related papers (2020-03-16T15:22:22Z) - Refinement of Unsupervised Cross-Lingual Word Embeddings [2.4366811507669124]
Cross-lingual word embeddings aim to bridge the gap between high-resource and low-resource languages.
We propose a self-supervised method to refine the alignment of unsupervised bilingual word embeddings.
arXiv Detail & Related papers (2020-02-21T10:39:53Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z) - A Common Semantic Space for Monolingual and Cross-Lingual
Meta-Embeddings [10.871587311621974]
This paper presents a new technique for creating monolingual and cross-lingual meta-embeddings.
Existing word vectors are projected to a common semantic space using linear transformations and averaging.
The resulting cross-lingual meta-embeddings also exhibit excellent cross-lingual transfer learning capabilities.
arXiv Detail & Related papers (2020-01-17T15:42:29Z) - Robust Cross-lingual Embeddings from Parallel Sentences [65.85468628136927]
We propose a bilingual extension of the CBOW method which leverages sentence-aligned corpora to obtain robust cross-lingual word representations.
Our approach significantly improves crosslingual sentence retrieval performance over all other approaches.
It also achieves parity with a deep RNN method on a zero-shot cross-lingual document classification task.
arXiv Detail & Related papers (2019-12-28T16:18:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.