Creating Reverse Bilingual Dictionaries
- URL: http://arxiv.org/abs/2208.03863v1
- Date: Mon, 8 Aug 2022 01:41:55 GMT
- Title: Creating Reverse Bilingual Dictionaries
- Authors: Khang Nhut Lam and Jugal Kalita
- Abstract summary: We propose algorithms for creation of new reverse bilingual dictionaries from existing bilingual dictionaries.
Our algorithms exploit the similarity between word-concept pairs using the English Wordnet to produce reverse dictionary entries.
- Score: 2.792030485253753
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Bilingual dictionaries are expensive resources and not many are available
when one of the languages is resource-poor. In this paper, we propose
algorithms for creation of new reverse bilingual dictionaries from existing
bilingual dictionaries in which English is one of the two languages. Our
algorithms exploit the similarity between word-concept pairs using the English
Wordnet to produce reverse dictionary entries. Since our algorithms rely on
available bilingual dictionaries, they are applicable to any bilingual
dictionary as long as one of the two languages has Wordnet type lexical
ontology.
Related papers
- Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages [55.157295899188476]
neural machine translation systems learn to map sentences of different languages into a common representation space.
In this work, we test this hypothesis by zero-shot translating from unseen languages.
We demonstrate that this setup enables zero-shot translation from entirely unseen languages.
arXiv Detail & Related papers (2024-08-05T07:58:58Z) - Automatically Creating a Large Number of New Bilingual Dictionaries [2.363388546004777]
This paper proposes approaches to automatically create a large number of new bilingual dictionaries for low-resource languages.
Our algorithms produce translations of words in a source language to plentiful target languages using available Wordnets and a machine translator.
arXiv Detail & Related papers (2022-08-12T04:25:23Z) - Creating Lexical Resources for Endangered Languages [2.363388546004777]
Our algorithms construct bilingual dictionaries and multilingual thesauruses using public Wordnets and a machine translator (MT)
Since our work relies on only one bilingual dictionary between an endangered language and an "intermediate helper" language, it is applicable to languages that lack many existing resources.
arXiv Detail & Related papers (2022-08-08T02:31:28Z) - Automatically constructing Wordnet synsets [2.363388546004777]
We propose approaches to generate Wordnet synsets for languages both resource-rich and resource-poor.
Our algorithms translate synsets of existing Wordnets to a target language T, then apply a ranking method on the translation candidates to find best translations in T.
arXiv Detail & Related papers (2022-08-08T02:02:18Z) - Dict-NMT: Bilingual Dictionary based NMT for Extremely Low Resource
Languages [1.8787713898828164]
We present a detailed analysis of the effects of the quality of dictionaries, training dataset size, language family, etc., on the translation quality.
Results on multiple low-resource test languages show a clear advantage of our bilingual dictionary-based method over the baselines.
arXiv Detail & Related papers (2022-06-09T12:03:29Z) - Cross-lingual Transfer for Text Classification with Dictionary-based
Heterogeneous Graph [10.64488240379972]
In cross-lingual text classification, it is required that task-specific training data in high-resource source languages are available.
Collecting such training data can be infeasible because of the labeling cost, task characteristics, and privacy concerns.
This paper proposes an alternative solution that uses only task-independent word embeddings of high-resource languages and bilingual dictionaries.
arXiv Detail & Related papers (2021-09-09T16:40:40Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Revisiting Language Encoding in Learning Multilingual Representations [70.01772581545103]
We propose a new approach called Cross-lingual Language Projection (XLP) to replace language embedding.
XLP projects the word embeddings into language-specific semantic space, and then the projected embeddings will be fed into the Transformer model.
Experiments show that XLP can freely and significantly boost the model performance on extensive multilingual benchmark datasets.
arXiv Detail & Related papers (2021-02-16T18:47:10Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual
Lexical Semantic Similarity [67.36239720463657]
Multi-SimLex is a large-scale lexical resource and evaluation benchmark covering datasets for 12 diverse languages.
Each language dataset is annotated for the lexical relation of semantic similarity and contains 1,888 semantically aligned concept pairs.
Owing to the alignment of concepts across languages, we provide a suite of 66 cross-lingual semantic similarity datasets.
arXiv Detail & Related papers (2020-03-10T17:17:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.