BERT Has Uncommon Sense: Similarity Ranking for Word Sense BERTology
- URL: http://arxiv.org/abs/2109.09780v1
- Date: Mon, 20 Sep 2021 18:15:26 GMT
- Title: BERT Has Uncommon Sense: Similarity Ranking for Word Sense BERTology
- Authors: Luke Gessler, Nathan Schneider
- Abstract summary: We investigate how well contextualized word embedding models can represent different word senses.
We find that several popular CWE models all outperform a random baseline even for proportionally rare senses, without explicit sense supervision.
- Score: 11.650381752104298
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An important question concerning contextualized word embedding (CWE) models
like BERT is how well they can represent different word senses, especially
those in the long tail of uncommon senses. Rather than build a WSD system as in
previous work, we investigate contextualized embedding neighborhoods directly,
formulating a query-by-example nearest neighbor retrieval task and examining
ranking performance for words and senses in different frequency bands. In an
evaluation on two English sense-annotated corpora, we find that several popular
CWE models all outperform a random baseline even for proportionally rare
senses, without explicit sense supervision. However, performance varies
considerably even among models with similar architectures and pretraining
regimes, with especially large differences for rare word senses, revealing that
CWE models are not all created equal when it comes to approximating word senses
in their native representations.
Related papers
- Word sense extension [8.939269057094661]
We present a paradigm of word sense extension (WSE) that enables words to spawn new senses toward novel context.
We develop a framework that simulates novel word sense extension by partitioning a polysemous word type into two pseudo-tokens that mark its different senses.
Our framework combines cognitive models of chaining with a learning scheme that transforms a language model embedding space to support various types of word sense extension.
arXiv Detail & Related papers (2023-06-09T00:54:21Z) - What do Language Models know about word senses? Zero-Shot WSD with
Language Models and Domain Inventories [23.623074512572593]
We aim to explore to what extent language models are capable of discerning among senses at inference time.
We leverage the relation between word senses and domains, and cast Word Sense Disambiguation (WSD) as a textual entailment problem.
Our results show that this approach is indeed effective, close to supervised systems.
arXiv Detail & Related papers (2023-02-07T09:55:07Z) - Subject Verb Agreement Error Patterns in Meaningless Sentences: Humans
vs. BERT [64.40111510974957]
We test whether meaning interferes with subject-verb number agreement in English.
We generate semantically well-formed and nonsensical items.
We find that BERT and humans are both sensitive to our semantic manipulation.
arXiv Detail & Related papers (2022-09-21T17:57:23Z) - Large Scale Substitution-based Word Sense Induction [48.49573297876054]
We present a word-sense induction method based on pre-trained masked language models (MLMs), which can cheaply scale to large vocabularies and large corpora.
The result is a corpus which is sense-tagged according to a corpus-derived sense inventory and where each sense is associated with indicative words.
Evaluation on English Wikipedia that was sense-tagged using our method shows that both the induced senses, and the per-instance sense assignment, are of high quality even compared to WSD methods, such as Babelfy.
arXiv Detail & Related papers (2021-10-14T19:40:37Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Sense representations for Portuguese: experiments with sense embeddings
and deep neural language models [0.0]
Unsupervised sense representations can induce different senses of a word by analyzing its contextual semantics in a text.
We present the first experiments carried out for generating sense embeddings for Portuguese.
arXiv Detail & Related papers (2021-08-31T18:07:01Z) - EDS-MEMBED: Multi-sense embeddings based on enhanced distributional
semantic structures via a graph walk over word senses [0.0]
We leverage the rich semantic structures in WordNet to enhance the quality of multi-sense embeddings.
We derive new distributional semantic similarity measures for M-SE from prior ones.
We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks.
arXiv Detail & Related papers (2021-02-27T14:36:55Z) - MASKER: Masked Keyword Regularization for Reliable Text Classification [73.90326322794803]
We propose a fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction.
MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context.
We demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy.
arXiv Detail & Related papers (2020-12-17T04:54:16Z) - Temporal Common Sense Acquisition with Minimal Supervision [77.8308414884754]
This work proposes a novel sequence modeling approach that exploits explicit and implicit mentions of temporal common sense.
Our method is shown to give quality predictions of various dimensions of temporal common sense.
It also produces representations of events for relevant tasks such as duration comparison, parent-child relations, event coreference and temporal QA.
arXiv Detail & Related papers (2020-05-08T22:20:16Z) - Moving Down the Long Tail of Word Sense Disambiguation with
Gloss-Informed Biencoders [79.38278330678965]
A major obstacle in Word Sense Disambiguation (WSD) is that word senses are not uniformly distributed.
We propose a bi-encoder model that independently embeds (1) the target word with its surrounding context and (2) the dictionary definition, or gloss, of each sense.
arXiv Detail & Related papers (2020-05-06T04:21:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.