Semi-automatic WordNet Linking using Word Embeddings
- URL: http://arxiv.org/abs/2201.01747v1
- Date: Wed, 5 Jan 2022 18:15:55 GMT
- Title: Semi-automatic WordNet Linking using Word Embeddings
- Authors: Kevin Patel, Diptesh Kanojia, Pushpak Bhattacharyya
- Abstract summary: Linked wordnets are extensions of wordnets, which link similar concepts in wordnets of different languages.
We propose an approach to link wordnets. Given a synset of the source language, the approach returns a ranked list of potential candidate synsets.
Our technique is able to retrieve a winner synset in the top 10 ranked list for 60% of all synsets and 70% of noun synsets.
- Score: 33.15250956247636
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Wordnets are rich lexico-semantic resources. Linked wordnets are extensions
of wordnets, which link similar concepts in wordnets of different languages.
Such resources are extremely useful in many Natural Language Processing (NLP)
applications, primarily those based on knowledge-based approaches. In such
approaches, these resources are considered as gold standard/oracle. Thus, it is
crucial that these resources hold correct information. Thereby, they are
created by human experts. However, manual maintenance of such resources is a
tedious and costly affair. Thus techniques that can aid the experts are
desirable. In this paper, we propose an approach to link wordnets. Given a
synset of the source language, the approach returns a ranked list of potential
candidate synsets in the target language from which the human expert can choose
the correct one(s). Our technique is able to retrieve a winner synset in the
top 10 ranked list for 60% of all synsets and 70% of noun synsets.
Related papers
- Automatically constructing Wordnet synsets [2.363388546004777]
We propose approaches to generate Wordnet synsets for languages both resource-rich and resource-poor.
Our algorithms translate synsets of existing Wordnets to a target language T, then apply a ranking method on the translation candidates to find best translations in T.
arXiv Detail & Related papers (2022-08-08T02:02:18Z) - Towards Automatic Construction of Filipino WordNet: Word Sense Induction
and Synset Induction Using Sentence Embeddings [0.7214142393172727]
This study proposes a method for word sense induction and synset induction using only two linguistic resources.
The resulting sense inventory and synonym sets can be used in automatically creating a wordnet.
This study empirically shows that the 30% of the induced word senses are valid and 40% of the induced synsets are valid in which 20% are novel synsets.
arXiv Detail & Related papers (2022-04-07T06:50:37Z) - Expanding Pretrained Models to Thousands More Languages via
Lexicon-based Adaptation [133.7313847857935]
Our study highlights how NLP methods can be adapted to thousands more languages that are under-served by current technology.
For 19 under-represented languages across 3 tasks, our methods lead to consistent improvements of up to 5 and 15 points with and without extra monolingual text respectively.
arXiv Detail & Related papers (2022-03-17T16:48:22Z) - Indian Language Wordnets and their Linkages with Princeton WordNet [38.50911435531732]
We release mappings of 18 Indian language wordnets linked with Princeton WordNet.
We believe that availability of such resources will have a direct impact on the progress in NLP for these languages.
arXiv Detail & Related papers (2022-01-09T10:12:31Z) - Multilingual Irony Detection with Dependency Syntax and Neural Models [61.32653485523036]
It focuses on the contribution from syntactic knowledge, exploiting linguistic resources where syntax is annotated according to the Universal Dependencies scheme.
The results suggest that fine-grained dependency-based syntactic information is informative for the detection of irony.
arXiv Detail & Related papers (2020-11-11T11:22:05Z) - Computational linguistic assessment of textbook and online learning
media by means of threshold concepts in business education [59.003956312175795]
From a linguistic perspective, threshold concepts are instances of specialized vocabularies, exhibiting particular linguistic features.
The profiles of 63 threshold concepts from business education have been investigated in textbooks, newspapers, and Wikipedia.
The three kinds of resources can indeed be distinguished in terms of their threshold concepts' profiles.
arXiv Detail & Related papers (2020-08-05T12:56:16Z) - An Algorithm for Fuzzification of WordNets, Supported by a Mathematical
Proof [3.684688928766659]
We present an algorithm for constructing fuzzy versions of WLDs of any language.
We publish online the fuzzified version of English WordNet (FWN)
arXiv Detail & Related papers (2020-06-07T04:47:40Z) - Word Sense Disambiguation for 158 Languages using Word Embeddings Only [80.79437083582643]
Disambiguation of word senses in context is easy for humans, but a major challenge for automatic approaches.
We present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory.
We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings.
arXiv Detail & Related papers (2020-03-14T14:50:04Z) - Automatic Compilation of Resources for Academic Writing and Evaluating
with Informal Word Identification and Paraphrasing System [24.42822218256954]
We present the first approach to automatically building resources for academic writing.
The aim is to build a writing aid system that automatically edits a text so that it better adheres to the academic style of writing.
arXiv Detail & Related papers (2020-03-05T22:55:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.