Word Sense Disambiguation for 158 Languages using Word Embeddings Only
- URL: http://arxiv.org/abs/2003.06651v1
- Date: Sat, 14 Mar 2020 14:50:04 GMT
- Title: Word Sense Disambiguation for 158 Languages using Word Embeddings Only
- Authors: Varvara Logacheva and Denis Teslenko and Artem Shelmanov and Steffen
Remus and Dmitry Ustalov and Andrey Kutuzov and Ekaterina Artemova and Chris
Biemann and Simone Paolo Ponzetto and Alexander Panchenko
- Abstract summary: Disambiguation of word senses in context is easy for humans, but a major challenge for automatic approaches.
We present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory.
We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings.
- Score: 80.79437083582643
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Disambiguation of word senses in context is easy for humans, but is a major
challenge for automatic approaches. Sophisticated supervised and
knowledge-based models were developed to solve this task. However, (i) the
inherent Zipfian distribution of supervised training instances for a given word
and/or (ii) the quality of linguistic knowledge representations motivate the
development of completely unsupervised and knowledge-free approaches to word
sense disambiguation (WSD). They are particularly useful for under-resourced
languages which do not have any resources for building either supervised and/or
knowledge-based models. In this paper, we present a method that takes as input
a standard pre-trained word embedding model and induces a fully-fledged word
sense inventory, which can be used for disambiguation in context. We use this
method to induce a collection of sense inventories for 158 languages on the
basis of the original pre-trained fastText word embeddings by Grave et al.
(2018), enabling WSD in these languages. Models and system are available
online.
Related papers
- Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - Word Sense Induction with Knowledge Distillation from BERT [6.88247391730482]
This paper proposes a method to distill multiple word senses from a pre-trained language model (BERT) by using attention over the senses of a word in a context.
Experiments on the contextual word similarity and sense induction tasks show that this method is superior to or competitive with state-of-the-art multi-sense embeddings.
arXiv Detail & Related papers (2023-04-20T21:05:35Z) - Multilingual Word Sense Disambiguation with Unified Sense Representation [55.3061179361177]
We propose building knowledge and supervised-based Multilingual Word Sense Disambiguation (MWSD) systems.
We build unified sense representations for multiple languages and address the annotation scarcity problem for MWSD by transferring annotations from rich-sourced languages to poorer ones.
Evaluations of SemEval-13 and SemEval-15 datasets demonstrate the effectiveness of our methodology.
arXiv Detail & Related papers (2022-10-14T01:24:03Z) - Augmenting semantic lexicons using word embeddings and transfer learning [1.101002667958165]
We propose two models for predicting sentiment scores to augment semantic lexicons at a relatively low cost using word embeddings and transfer learning.
Our evaluation shows both models are able to score new words with a similar accuracy to reviewers from Amazon Mechanical Turk, but at a fraction of the cost.
arXiv Detail & Related papers (2021-09-18T20:59:52Z) - UCPhrase: Unsupervised Context-aware Quality Phrase Tagging [63.86606855524567]
UCPhrase is a novel unsupervised context-aware quality phrase tagger.
We induce high-quality phrase spans as silver labels from consistently co-occurring word sequences.
We show that our design is superior to state-of-the-art pre-trained, unsupervised, and distantly supervised methods.
arXiv Detail & Related papers (2021-05-28T19:44:24Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - MICE: Mining Idioms with Contextual Embeddings [0.0]
MICEatic expressions can be problematic for natural language processing applications.
We present an approach that uses contextual embeddings for that purpose.
We show that deep neural networks using both embeddings perform much better than existing approaches.
arXiv Detail & Related papers (2020-08-13T08:56:40Z) - Don't Neglect the Obvious: On the Role of Unambiguous Words in Word
Sense Disambiguation [5.8523859781812435]
We show how a state-of-the-art propagation-based model can use it to extend the coverage and quality of its word sense embeddings.
We introduce the UWA (Unambiguous Word s) dataset and show how a state-of-the-art propagation-based model can use it to extend the coverage and quality of its word sense embeddings.
arXiv Detail & Related papers (2020-04-29T16:51:21Z) - Semantic Relatedness for Keyword Disambiguation: Exploiting Different
Embeddings [0.0]
We propose an approach to keyword disambiguation which grounds on a semantic relatedness between words and senses provided by an external inventory (ontology) that is not known at training time.
Experimental results show that this approach achieves results comparable with the state of the art when applied for Word Sense Disambiguation (WSD) without training for a particular domain.
arXiv Detail & Related papers (2020-02-25T16:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.