Towards Automatic Construction of Filipino WordNet: Word Sense Induction
and Synset Induction Using Sentence Embeddings
- URL: http://arxiv.org/abs/2204.03251v3
- Date: Thu, 19 Oct 2023 06:42:39 GMT
- Title: Towards Automatic Construction of Filipino WordNet: Word Sense Induction
and Synset Induction Using Sentence Embeddings
- Authors: Dan John Velasco, Axel Alba, Trisha Gail Pelagio, Bryce Anthony
Ramirez, Unisse Chua, Briane Paul Samson, Jan Christian Blaise Cruz and
Charibeth Cheng
- Abstract summary: This study proposes a method for word sense induction and synset induction using only two linguistic resources.
The resulting sense inventory and synonym sets can be used in automatically creating a wordnet.
This study empirically shows that the 30% of the induced word senses are valid and 40% of the induced synsets are valid in which 20% are novel synsets.
- Score: 0.7214142393172727
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Wordnets are indispensable tools for various natural language processing
applications. Unfortunately, wordnets get outdated, and producing or updating
wordnets can be slow and costly in terms of time and resources. This problem
intensifies for low-resource languages. This study proposes a method for word
sense induction and synset induction using only two linguistic resources,
namely, an unlabeled corpus and a sentence embeddings-based language model. The
resulting sense inventory and synonym sets can be used in automatically
creating a wordnet. We applied this method on a corpus of Filipino text. The
sense inventory and synsets were evaluated by matching them with the sense
inventory of the machine translated Princeton WordNet, as well as comparing the
synsets to the Filipino WordNet. This study empirically shows that the 30% of
the induced word senses are valid and 40% of the induced synsets are valid in
which 20% are novel synsets.
Related papers
- Coarse-Grained Sense Inventories Based on Semantic Matching between English Dictionaries [0.0]
We semantically match sense definitions from Cambridge dictionaries and WordNet and develop new coarse-grained sense inventories.
The advantages of the proposed inventories include their low dependency on large-scale resources, better aggregation of closely related senses, CEFR-level assignments, and ease of expansion and improvement.
arXiv Detail & Related papers (2024-09-10T10:08:58Z) - Automatically constructing Wordnet synsets [2.363388546004777]
We propose approaches to generate Wordnet synsets for languages both resource-rich and resource-poor.
Our algorithms translate synsets of existing Wordnets to a target language T, then apply a ranking method on the translation candidates to find best translations in T.
arXiv Detail & Related papers (2022-08-08T02:02:18Z) - Semi-automatic WordNet Linking using Word Embeddings [33.15250956247636]
Linked wordnets are extensions of wordnets, which link similar concepts in wordnets of different languages.
We propose an approach to link wordnets. Given a synset of the source language, the approach returns a ranked list of potential candidate synsets.
Our technique is able to retrieve a winner synset in the top 10 ranked list for 60% of all synsets and 70% of noun synsets.
arXiv Detail & Related papers (2022-01-05T18:15:55Z) - Utilizing Wordnets for Cognate Detection among Indian Languages [50.83320088758705]
We detect cognate word pairs among ten Indian languages with Hindi.
We use deep learning methodologies to predict whether a word pair is cognate or not.
We report improved performance of up to 26%.
arXiv Detail & Related papers (2021-12-30T16:46:28Z) - Multilingual Irony Detection with Dependency Syntax and Neural Models [61.32653485523036]
It focuses on the contribution from syntactic knowledge, exploiting linguistic resources where syntax is annotated according to the Universal Dependencies scheme.
The results suggest that fine-grained dependency-based syntactic information is informative for the detection of irony.
arXiv Detail & Related papers (2020-11-11T11:22:05Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z) - Speakers Fill Lexical Semantic Gaps with Context [65.08205006886591]
We operationalise the lexical ambiguity of a word as the entropy of meanings it can take.
We find significant correlations between our estimate of ambiguity and the number of synonyms a word has in WordNet.
This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.
arXiv Detail & Related papers (2020-10-05T17:19:10Z) - Moving Down the Long Tail of Word Sense Disambiguation with
Gloss-Informed Biencoders [79.38278330678965]
A major obstacle in Word Sense Disambiguation (WSD) is that word senses are not uniformly distributed.
We propose a bi-encoder model that independently embeds (1) the target word with its surrounding context and (2) the dictionary definition, or gloss, of each sense.
arXiv Detail & Related papers (2020-05-06T04:21:45Z) - Word Sense Disambiguation for 158 Languages using Word Embeddings Only [80.79437083582643]
Disambiguation of word senses in context is easy for humans, but a major challenge for automatic approaches.
We present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory.
We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings.
arXiv Detail & Related papers (2020-03-14T14:50:04Z) - Multi-Fusion Chinese WordNet (MCW) : Compound of Machine Learning and
Manual Correction [7.471172518764192]
Five Chinese wordnets have been developed to solve the problems of syntax and semantics.
They include: Northeastern University Chinese WordNet (NEW), Sinica Bilingual Ontological WordNet (BOW), Southeast University Chinese WordNet (SEW), Taiwan University Chinese WordNet (CWN), Chinese Open WordNet (COW)
We decided to make a new Chinese wordnet called Multi-Fusion Chinese Wordnet (MCW) to make up those shortcomings.
arXiv Detail & Related papers (2020-02-05T12:44:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.