RAPO: An Adaptive Ranking Paradigm for Bilingual Lexicon Induction
- URL: http://arxiv.org/abs/2210.09926v1
- Date: Tue, 18 Oct 2022 15:11:45 GMT
- Title: RAPO: An Adaptive Ranking Paradigm for Bilingual Lexicon Induction
- Authors: Zhoujin Tian, Chaozhuo Li, Shuo Ren, Zhiqiang Zuo, Zengxuan Wen,
Xinyue Hu, Xiao Han, Haizhen Huang, Denvy Deng, Qi Zhang, Xing Xie
- Abstract summary: lexicon induction induces the word translations by aligning independently trained word embeddings in two languages.
We propose a novel ranking-oriented induction model RAPO to learn personalized mapping function for each word.
RAPO is capable of enjoying the merits from the unique characteristics of a single word and the cross-language isomorphism simultaneously.
- Score: 25.19579637815882
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bilingual lexicon induction induces the word translations by aligning
independently trained word embeddings in two languages. Existing approaches
generally focus on minimizing the distances between words in the aligned pairs,
while suffering from low discriminative capability to distinguish the relative
orders between positive and negative candidates. In addition, the mapping
function is globally shared by all words, whose performance might be hindered
by the deviations in the distributions of different languages. In this work, we
propose a novel ranking-oriented induction model RAPO to learn personalized
mapping function for each word. RAPO is capable of enjoying the merits from the
unique characteristics of a single word and the cross-language isomorphism
simultaneously. Extensive experimental results on public datasets including
both rich-resource and low-resource languages demonstrate the superiority of
our proposal. Our code is publicly available in
\url{https://github.com/Jlfj345wf/RAPO}.
Related papers
- Robust Unsupervised Cross-Lingual Word Embedding using Domain Flow
Interpolation [48.32604585839687]
Previous adversarial approaches have shown promising results in inducing cross-lingual word embedding without parallel data.
We propose to make use of a sequence of intermediate spaces for smooth bridging.
arXiv Detail & Related papers (2022-10-07T04:37:47Z) - Word Embedding Transformation for Robust Unsupervised Bilingual Lexicon
Induction [21.782189001319935]
We propose a transformation-based method to increase the isomorphism of embeddings of two languages.
Our approach can achieve competitive or superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-05-26T02:09:58Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z) - A Generalized Constraint Approach to Bilingual Dictionary Induction for
Low-Resource Language Families [1.0312968200748118]
We propose constraint-based bilingual lexicon induction for closely-related languages.
We identify cognate synonyms to obtain many-to-many translation pairs.
arXiv Detail & Related papers (2020-10-05T23:41:04Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z) - Refinement of Unsupervised Cross-Lingual Word Embeddings [2.4366811507669124]
Cross-lingual word embeddings aim to bridge the gap between high-resource and low-resource languages.
We propose a self-supervised method to refine the alignment of unsupervised bilingual word embeddings.
arXiv Detail & Related papers (2020-02-21T10:39:53Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z) - Robust Cross-lingual Embeddings from Parallel Sentences [65.85468628136927]
We propose a bilingual extension of the CBOW method which leverages sentence-aligned corpora to obtain robust cross-lingual word representations.
Our approach significantly improves crosslingual sentence retrieval performance over all other approaches.
It also achieves parity with a deep RNN method on a zero-shot cross-lingual document classification task.
arXiv Detail & Related papers (2019-12-28T16:18:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.