How Lexical is Bilingual Lexicon Induction?
- URL: http://arxiv.org/abs/2404.04221v1
- Date: Fri, 5 Apr 2024 17:10:33 GMT
- Title: How Lexical is Bilingual Lexicon Induction?
- Authors: Harsh Kohli, Helian Feng, Nicholas Dronen, Calvin McCarter, Sina Moeini, Ali Kebarighotbi,
- Abstract summary: We argue that the incorporation of additional lexical information into the recent retrieve-and-rank approach should improve lexicon induction.
We demonstrate the efficacy of our proposed approach on XLING, improving over the previous state of the art by an average of 2% across all language pairs.
- Score: 1.3610643403050855
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In contemporary machine learning approaches to bilingual lexicon induction (BLI), a model learns a mapping between the embedding spaces of a language pair. Recently, retrieve-and-rank approach to BLI has achieved state of the art results on the task. However, the problem remains challenging in low-resource settings, due to the paucity of data. The task is complicated by factors such as lexical variation across languages. We argue that the incorporation of additional lexical information into the recent retrieve-and-rank approach should improve lexicon induction. We demonstrate the efficacy of our proposed approach on XLING, improving over the previous state of the art by an average of 2\% across all language pairs.
Related papers
- Semi-Supervised Learning for Bilingual Lexicon Induction [1.8130068086063336]
We consider the problem of aligning two sets of continuous word representations, corresponding to languages, to a common space in order to infer a bilingual lexicon.
Our experiments on standard benchmarks, inferring dictionary from English to more than 20 languages, show that our approach consistently outperforms existing state of the art benchmark.
arXiv Detail & Related papers (2024-02-10T19:27:22Z) - ProMap: Effective Bilingual Lexicon Induction via Language Model
Prompting [22.743097175747575]
We introduce ProMap, a novel approach for bilingual induction (BLI)
ProMap relies on an effective padded prompting of language models with a seed dictionary that achieves good performance when used independently.
When evaluated on both rich-resource and low-resource languages, ProMap consistently achieves state-of-the-art results.
arXiv Detail & Related papers (2023-10-28T18:33:24Z) - Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing [68.47787275021567]
Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data.
We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between latent variables using Optimal Transport.
arXiv Detail & Related papers (2023-07-09T04:52:31Z) - Exposing Cross-Lingual Lexical Knowledge from Multilingual Sentence
Encoders [85.80950708769923]
We probe multilingual language models for the amount of cross-lingual lexical knowledge stored in their parameters, and compare them against the original multilingual LMs.
We also devise a novel method to expose this knowledge by additionally fine-tuning multilingual models.
We report substantial gains on standard benchmarks.
arXiv Detail & Related papers (2022-04-30T13:23:16Z) - Word Embedding Transformation for Robust Unsupervised Bilingual Lexicon
Induction [21.782189001319935]
We propose a transformation-based method to increase the isomorphism of embeddings of two languages.
Our approach can achieve competitive or superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-05-26T02:09:58Z) - XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation [93.80733419450225]
This paper analyzes the current state of cross-lingual transfer learning.
We extend XTREME to XTREME-R, which consists of an improved set of ten natural language understanding tasks.
arXiv Detail & Related papers (2021-04-15T12:26:12Z) - Improving the Lexical Ability of Pretrained Language Models for
Unsupervised Neural Machine Translation [127.81351683335143]
Cross-lingual pretraining requires models to align the lexical- and high-level representations of the two languages.
Previous research has shown that this is because the representations are not sufficiently aligned.
In this paper, we enhance the bilingual masked language model pretraining with lexical-level information by using type-level cross-lingual subword embeddings.
arXiv Detail & Related papers (2021-03-18T21:17:58Z) - Bilingual Lexicon Induction via Unsupervised Bitext Construction and
Word Alignment [49.3253280592705]
We show it is possible to produce much higher quality lexicons with methods that combine bitext mining and unsupervised word alignment.
Our final model outperforms the state of the art on the BUCC 2020 shared task by 14 $F_1$ points averaged over 12 language pairs.
arXiv Detail & Related papers (2021-01-01T03:12:42Z) - Robust Cross-lingual Embeddings from Parallel Sentences [65.85468628136927]
We propose a bilingual extension of the CBOW method which leverages sentence-aligned corpora to obtain robust cross-lingual word representations.
Our approach significantly improves crosslingual sentence retrieval performance over all other approaches.
It also achieves parity with a deep RNN method on a zero-shot cross-lingual document classification task.
arXiv Detail & Related papers (2019-12-28T16:18:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.