Related papers: Automatically Creating a Large Number of New Bilingual Dictionaries

Automatically Creating a Large Number of New Bilingual Dictionaries

URL: http://arxiv.org/abs/2208.06110v1
Date: Fri, 12 Aug 2022 04:25:23 GMT
Title: Automatically Creating a Large Number of New Bilingual Dictionaries
Authors: Khang Nhut Lam and Feras Al Tarouti and Jugal Kalita
Abstract summary: This paper proposes approaches to automatically create a large number of new bilingual dictionaries for low-resource languages. Our algorithms produce translations of words in a source language to plentiful target languages using available Wordnets and a machine translator.
Score: 2.363388546004777
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This paper proposes approaches to automatically create a large number of new bilingual dictionaries for low-resource languages, especially resource-poor and endangered languages, from a single input bilingual dictionary. Our algorithms produce translations of words in a source language to plentiful target languages using available Wordnets and a machine translator (MT). Since our approaches rely on just one input dictionary, available Wordnets and an MT, they are applicable to any bilingual dictionary as long as one of the two languages is English or has a Wordnet linked to the Princeton Wordnet. Starting with 5 available bilingual dictionaries, we create 48 new bilingual dictionaries. Of these, 30 pairs of languages are not supported by the popular MTs: Google and Bing.

Related papers

Multilingual Word Embeddings for Low-Resource Languages using Anchors and a Chain of Related Languages [54.832599498774464]
We propose to build multilingual word embeddings (MWEs) via a novel language chain-based approach. We build MWEs one language at a time by starting from the resource rich source and sequentially adding each language in the chain till we reach the target. We evaluate our method on bilingual lexicon induction for 4 language families, involving 4 very low-resource (5M tokens) and 4 moderately low-resource (50M) target languages.
arXiv Detail & Related papers (2023-11-21T09:59:29Z)
Creating Lexical Resources for Endangered Languages [2.363388546004777]
Our algorithms construct bilingual dictionaries and multilingual thesauruses using public Wordnets and a machine translator (MT) Since our work relies on only one bilingual dictionary between an endangered language and an "intermediate helper" language, it is applicable to languages that lack many existing resources.
arXiv Detail & Related papers (2022-08-08T02:31:28Z)
Automatically constructing Wordnet synsets [2.363388546004777]
We propose approaches to generate Wordnet synsets for languages both resource-rich and resource-poor. Our algorithms translate synsets of existing Wordnets to a target language T, then apply a ranking method on the translation candidates to find best translations in T.
arXiv Detail & Related papers (2022-08-08T02:02:18Z)
Creating Reverse Bilingual Dictionaries [2.792030485253753]
We propose algorithms for creation of new reverse bilingual dictionaries from existing bilingual dictionaries. Our algorithms exploit the similarity between word-concept pairs using the English Wordnet to produce reverse dictionary entries.
arXiv Detail & Related papers (2022-08-08T01:41:55Z)
Dict-NMT: Bilingual Dictionary based NMT for Extremely Low Resource Languages [1.8787713898828164]
We present a detailed analysis of the effects of the quality of dictionaries, training dataset size, language family, etc., on the translation quality. Results on multiple low-resource test languages show a clear advantage of our bilingual dictionary-based method over the baselines.
arXiv Detail & Related papers (2022-06-09T12:03:29Z)
Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis. We cluster all the target languages into multiple groups and name each group as a representation sprachbund. Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z)
Anchor-based Bilingual Word Embeddings for Low-Resource Languages [76.48625630211943]
Good quality monolingual word embeddings (MWEs) can be built for languages which have large amounts of unlabeled text. MWEs can be aligned to bilingual spaces using only a few thousand word translation pairs. This paper proposes a new approach for building BWEs in which the vector space of the high resource source language is used as a starting point.
arXiv Detail & Related papers (2020-10-23T19:17:00Z)
Multilingual Translation with Extensible Multilingual Pretraining and Finetuning [77.33262578776291]
Previous work has demonstrated that machine translation systems can be created by finetuning on bitext. We show that multilingual translation models can be created through multilingual finetuning. We demonstrate that pretrained models can be extended to incorporate additional languages without loss of performance.
arXiv Detail & Related papers (2020-08-02T05:36:55Z)
Bilingual Dictionary Based Neural Machine Translation without Using Parallel Sentences [45.99290614777277]
We propose a new task of machine translation (MT) based on no parallel sentences but can refer to a ground-truth bilingual dictionary. Motivated by the ability of a monolingual speaker learning to translate via looking up the bilingual dictionary, we propose the task to see how much potential an MT system can attain.
arXiv Detail & Related papers (2020-07-06T12:05:27Z)
Learning to Scale Multilingual Representations for Vision-Language Tasks [51.27839182889422]
The effectiveness of SMALR is demonstrated with ten diverse languages, over twice the number supported in vision-language tasks to date. We evaluate on multilingual image-sentence retrieval and outperform prior work by 3-4% with less than 1/5th the training parameters compared to other word embedding methods.
arXiv Detail & Related papers (2020-04-09T01:03:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.