An Algorithm for Fuzzification of WordNets, Supported by a Mathematical
Proof
- URL: http://arxiv.org/abs/2006.04042v1
- Date: Sun, 7 Jun 2020 04:47:40 GMT
- Title: An Algorithm for Fuzzification of WordNets, Supported by a Mathematical
Proof
- Authors: Sayyed-Ali Hossayni, Mohammad-R Akbarzadeh-T, Diego Reforgiato
Recupero, Aldo Gangemi, Esteve Del Acebo, Josep Llu\'is de la Rosa i Esteva
- Abstract summary: We present an algorithm for constructing fuzzy versions of WLDs of any language.
We publish online the fuzzified version of English WordNet (FWN)
- Score: 3.684688928766659
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: WordNet-like Lexical Databases (WLDs) group English words into sets of
synonyms called "synsets." Although the standard WLDs are being used in many
successful Text-Mining applications, they have the limitation that word-senses
are considered to represent the meaning associated to their corresponding
synsets, to the same degree, which is not generally true. In order to overcome
this limitation, several fuzzy versions of synsets have been proposed. A common
trait of these studies is that, to the best of our knowledge, they do not aim
to produce fuzzified versions of the existing WLD's, but build new WLDs from
scratch, which has limited the attention received from the Text-Mining
community, many of whose resources and applications are based on the existing
WLDs. In this study, we present an algorithm for constructing fuzzy versions of
WLDs of any language, given a corpus of documents and a word-sense
disambiguation (WSD) system for that language. Then, using the
Open-American-National-Corpus and UKB WSD as algorithm inputs, we construct and
publish online the fuzzified version of English WordNet (FWN). We also propose
a theoretical (mathematical) proof of the validity of its results.
Related papers
- Dictionary Insertion Prompting for Multilingual Reasoning on Multilingual Large Language Models [52.00446751692225]
We present a novel and simple yet effective method called textbfDictionary textbfInsertion textbfPrompting (textbfDIP)
When providing a non-English prompt, DIP looks up a word dictionary and inserts words' English counterparts into the prompt for LLMs.
It then enables better translation into English and better English model thinking steps which leads to obviously better results.
arXiv Detail & Related papers (2024-11-02T05:10:50Z) - The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics [74.99898531299148]
This research examines vocabulary trimming (VT) inspired by restricting embedding entries to the language of interest to bolster time and memory efficiency.
We apply two languages to trim the full vocabulary - Unicode-based script filtering and corpus-based selection - to different language families and sizes.
It is found that VT reduces the memory usage of small models by nearly 50% and has an upper bound of 25% improvement in generation speed.
arXiv Detail & Related papers (2023-11-16T09:35:50Z) - DICTDIS: Dictionary Constrained Disambiguation for Improved NMT [50.888881348723295]
We present DictDis, a lexically constrained NMT system that disambiguates between multiple candidate translations derived from dictionaries.
We demonstrate the utility of DictDis via extensive experiments on English-Hindi and English-German sentences in a variety of domains including regulatory, finance, engineering.
arXiv Detail & Related papers (2022-10-13T13:04:16Z) - Always Keep your Target in Mind: Studying Semantics and Improving
Performance of Neural Lexical Substitution [124.99894592871385]
We present a large-scale comparative study of lexical substitution methods employing both old and most recent language models.
We show that already competitive results achieved by SOTA LMs/MLMs can be further substantially improved if information about the target word is injected properly.
arXiv Detail & Related papers (2022-06-07T16:16:19Z) - Interval Probabilistic Fuzzy WordNet [8.396691008449704]
We present an algorithm for constructing the Interval Probabilistic Fuzzy (IPF) synsets in any language.
We constructed and published the IPF synsets of WordNet for English language.
arXiv Detail & Related papers (2021-04-04T17:28:37Z) - Deconstructing word embedding algorithms [17.797952730495453]
We propose a retrospective on some of the most well-known word embedding algorithms.
In this work, we deconstruct Word2vec, GloVe, and others, into a common form, unveiling some of the common conditions that seem to be required for making performant word embeddings.
arXiv Detail & Related papers (2020-11-12T14:23:35Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z) - A Comparative Study of Lexical Substitution Approaches based on Neural
Language Models [117.96628873753123]
We present a large-scale comparative study of popular neural language and masked language models.
We show that already competitive results achieved by SOTA LMs/MLMs can be further improved if information about the target word is injected properly.
arXiv Detail & Related papers (2020-05-29T18:43:22Z) - Language-Independent Tokenisation Rivals Language-Specific Tokenisation
for Word Similarity Prediction [12.376752724719005]
Language-independent tokenisation (LIT) methods do not require labelled language resources or lexicons.
Language-specific tokenisation (LST) methods have a long and established history, and are developed using carefully created lexicons and training resources.
We empirically compare the two approaches using semantic similarity measurement as an evaluation task across a diverse set of languages.
arXiv Detail & Related papers (2020-02-25T16:24:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.