Automatically constructing Wordnet synsets
- URL: http://arxiv.org/abs/2208.03870v1
- Date: Mon, 8 Aug 2022 02:02:18 GMT
- Title: Automatically constructing Wordnet synsets
- Authors: Khang Nhut Lam, Feras Al Tarouti and Jugal Kalita
- Abstract summary: We propose approaches to generate Wordnet synsets for languages both resource-rich and resource-poor.
Our algorithms translate synsets of existing Wordnets to a target language T, then apply a ranking method on the translation candidates to find best translations in T.
- Score: 2.363388546004777
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Manually constructing a Wordnet is a difficult task, needing years of
experts' time. As a first step to automatically construct full Wordnets, we
propose approaches to generate Wordnet synsets for languages both resource-rich
and resource-poor, using publicly available Wordnets, a machine translator
and/or a single bilingual dictionary. Our algorithms translate synsets of
existing Wordnets to a target language T, then apply a ranking method on the
translation candidates to find best translations in T. Our approaches are
applicable to any language which has at least one existing bilingual dictionary
translating from English to it.
Related papers
- Wav2Gloss: Generating Interlinear Glossed Text from Speech [78.64412090339044]
We propose Wav2Gloss, a task in which four linguistic annotation components are extracted automatically from speech.
We provide various baselines to lay the groundwork for future research on Interlinear Glossed Text generation from speech.
arXiv Detail & Related papers (2024-03-19T21:45:29Z) - Media of Langue: The Interface for Exploring Word Translation Network/Space [0.0]
We discover the huge network formed by the chain of these mutual translations as Word Translation Network.
We propose Media of Langue, a novel interface for exploring this network.
arXiv Detail & Related papers (2023-08-25T03:54:20Z) - CompoundPiece: Evaluating and Improving Decompounding Performance of
Language Models [77.45934004406283]
We systematically study decompounding, the task of splitting compound words into their constituents.
We introduce a dataset of 255k compound and non-compound words across 56 diverse languages obtained from Wiktionary.
We introduce a novel methodology to train dedicated models for decompounding.
arXiv Detail & Related papers (2023-05-23T16:32:27Z) - Advancing Multilingual Pre-training: TRIP Triangular Document-level
Pre-training for Multilingual Language Models [107.83158521848372]
We present textbfTriangular Document-level textbfPre-training (textbfTRIP), which is the first in the field to accelerate the conventional monolingual and bilingual objectives into a trilingual objective with a novel method called Grafting.
TRIP achieves several strong state-of-the-art (SOTA) scores on three multilingual document-level machine translation benchmarks and one cross-lingual abstractive summarization benchmark, including consistent improvements by up to 3.11 d-BLEU points and 8.9 ROUGE-L points.
arXiv Detail & Related papers (2022-12-15T12:14:25Z) - Automatically Creating a Large Number of New Bilingual Dictionaries [2.363388546004777]
This paper proposes approaches to automatically create a large number of new bilingual dictionaries for low-resource languages.
Our algorithms produce translations of words in a source language to plentiful target languages using available Wordnets and a machine translator.
arXiv Detail & Related papers (2022-08-12T04:25:23Z) - Creating Lexical Resources for Endangered Languages [2.363388546004777]
Our algorithms construct bilingual dictionaries and multilingual thesauruses using public Wordnets and a machine translator (MT)
Since our work relies on only one bilingual dictionary between an endangered language and an "intermediate helper" language, it is applicable to languages that lack many existing resources.
arXiv Detail & Related papers (2022-08-08T02:31:28Z) - Creating Reverse Bilingual Dictionaries [2.792030485253753]
We propose algorithms for creation of new reverse bilingual dictionaries from existing bilingual dictionaries.
Our algorithms exploit the similarity between word-concept pairs using the English Wordnet to produce reverse dictionary entries.
arXiv Detail & Related papers (2022-08-08T01:41:55Z) - Towards Automatic Construction of Filipino WordNet: Word Sense Induction
and Synset Induction Using Sentence Embeddings [0.7214142393172727]
This study proposes a method for word sense induction and synset induction using only two linguistic resources.
The resulting sense inventory and synonym sets can be used in automatically creating a wordnet.
This study empirically shows that the 30% of the induced word senses are valid and 40% of the induced synsets are valid in which 20% are novel synsets.
arXiv Detail & Related papers (2022-04-07T06:50:37Z) - Data Augmentation for Sign Language Gloss Translation [115.13684506803529]
Sign language translation (SLT) is often decomposed into video-to-gloss recognition and gloss-totext translation.
We focus here on gloss-to-text translation, which we treat as a low-resource neural machine translation (NMT) problem.
By pre-training on the thus obtained synthetic data, we improve translation from American Sign Language (ASL) to English and German Sign Language (DGS) to German by up to 3.14 and 2.20 BLEU, respectively.
arXiv Detail & Related papers (2021-05-16T16:37:36Z) - Improving the Lexical Ability of Pretrained Language Models for
Unsupervised Neural Machine Translation [127.81351683335143]
Cross-lingual pretraining requires models to align the lexical- and high-level representations of the two languages.
Previous research has shown that this is because the representations are not sufficiently aligned.
In this paper, we enhance the bilingual masked language model pretraining with lexical-level information by using type-level cross-lingual subword embeddings.
arXiv Detail & Related papers (2021-03-18T21:17:58Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.