Don't Neglect the Obvious: On the Role of Unambiguous Words in Word
Sense Disambiguation
- URL: http://arxiv.org/abs/2004.14325v3
- Date: Fri, 23 Oct 2020 09:20:11 GMT
- Title: Don't Neglect the Obvious: On the Role of Unambiguous Words in Word
Sense Disambiguation
- Authors: Daniel Loureiro and Jose Camacho-Collados
- Abstract summary: We show how a state-of-the-art propagation-based model can use it to extend the coverage and quality of its word sense embeddings.
We introduce the UWA (Unambiguous Word s) dataset and show how a state-of-the-art propagation-based model can use it to extend the coverage and quality of its word sense embeddings.
- Score: 5.8523859781812435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art methods for Word Sense Disambiguation (WSD) combine two
different features: the power of pre-trained language models and a propagation
method to extend the coverage of such models. This propagation is needed as
current sense-annotated corpora lack coverage of many instances in the
underlying sense inventory (usually WordNet). At the same time, unambiguous
words make for a large portion of all words in WordNet, while being poorly
covered in existing sense-annotated corpora. In this paper, we propose a simple
method to provide annotations for most unambiguous words in a large corpus. We
introduce the UWA (Unambiguous Word Annotations) dataset and show how a
state-of-the-art propagation-based model can use it to extend the coverage and
quality of its word sense embeddings by a significant margin, improving on its
original results on WSD.
Related papers
- Multilingual Word Sense Disambiguation with Unified Sense Representation [55.3061179361177]
We propose building knowledge and supervised-based Multilingual Word Sense Disambiguation (MWSD) systems.
We build unified sense representations for multiple languages and address the annotation scarcity problem for MWSD by transferring annotations from rich-sourced languages to poorer ones.
Evaluations of SemEval-13 and SemEval-15 datasets demonstrate the effectiveness of our methodology.
arXiv Detail & Related papers (2022-10-14T01:24:03Z) - Lost in Context? On the Sense-wise Variance of Contextualized Word
Embeddings [11.475144702935568]
We quantify how much the contextualized embeddings of each word sense vary across contexts in typical pre-trained models.
We find that word representations are position-biased, where the first words in different contexts tend to be more similar.
arXiv Detail & Related papers (2022-08-20T12:27:25Z) - Chinese Word Sense Embedding with SememeWSD and Synonym Set [17.37973450772783]
We propose SememeWSD Synonym (SWSDS) model to assign a different vector to every sense of polysemous words.
We obtain top 10 synonyms of the word sense from OpenHowNet and calculate the average vector of synonyms as the vector of the word sense.
In experiments, We evaluate the SWSDS model on semantic similarity calculation with Gensim's wmdistance method.
arXiv Detail & Related papers (2022-06-29T03:42:03Z) - Connect-the-Dots: Bridging Semantics between Words and Definitions via
Aligning Word Sense Inventories [47.03271152494389]
Word Sense Disambiguation aims to automatically identify the exact meaning of one word according to its context.
Existing supervised models struggle to make correct predictions on rare word senses due to limited training data.
We propose a gloss alignment algorithm that can align definition sentences with the same meaning from different sense inventories to collect rich lexical knowledge.
arXiv Detail & Related papers (2021-10-27T00:04:33Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Meta-Learning with Variational Semantic Memory for Word Sense
Disambiguation [56.830395467247016]
We propose a model of semantic memory for WSD in a meta-learning setting.
Our model is based on hierarchical variational inference and incorporates an adaptive memory update rule via a hypernetwork.
We show our model advances the state of the art in few-shot WSD, supports effective learning in extremely data scarce scenarios.
arXiv Detail & Related papers (2021-06-05T20:40:01Z) - EDS-MEMBED: Multi-sense embeddings based on enhanced distributional
semantic structures via a graph walk over word senses [0.0]
We leverage the rich semantic structures in WordNet to enhance the quality of multi-sense embeddings.
We derive new distributional semantic similarity measures for M-SE from prior ones.
We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks.
arXiv Detail & Related papers (2021-02-27T14:36:55Z) - Moving Down the Long Tail of Word Sense Disambiguation with
Gloss-Informed Biencoders [79.38278330678965]
A major obstacle in Word Sense Disambiguation (WSD) is that word senses are not uniformly distributed.
We propose a bi-encoder model that independently embeds (1) the target word with its surrounding context and (2) the dictionary definition, or gloss, of each sense.
arXiv Detail & Related papers (2020-05-06T04:21:45Z) - Word Sense Disambiguation for 158 Languages using Word Embeddings Only [80.79437083582643]
Disambiguation of word senses in context is easy for humans, but a major challenge for automatic approaches.
We present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory.
We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings.
arXiv Detail & Related papers (2020-03-14T14:50:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.