Related papers: BERT Has Uncommon Sense: Similarity Ranking for Word Sense BERTology

BERT Has Uncommon Sense: Similarity Ranking for Word Sense BERTology

URL: http://arxiv.org/abs/2109.09780v1
Date: Mon, 20 Sep 2021 18:15:26 GMT
Title: BERT Has Uncommon Sense: Similarity Ranking for Word Sense BERTology
Authors: Luke Gessler, Nathan Schneider
Abstract summary: We investigate how well contextualized word embedding models can represent different word senses. We find that several popular CWE models all outperform a random baseline even for proportionally rare senses, without explicit sense supervision.
Score: 11.650381752104298
License: http://creativecommons.org/licenses/by/4.0/
Abstract: An important question concerning contextualized word embedding (CWE) models like BERT is how well they can represent different word senses, especially those in the long tail of uncommon senses. Rather than build a WSD system as in previous work, we investigate contextualized embedding neighborhoods directly, formulating a query-by-example nearest neighbor retrieval task and examining ranking performance for words and senses in different frequency bands. In an evaluation on two English sense-annotated corpora, we find that several popular CWE models all outperform a random baseline even for proportionally rare senses, without explicit sense supervision. However, performance varies considerably even among models with similar architectures and pretraining regimes, with especially large differences for rare word senses, revealing that CWE models are not all created equal when it comes to approximating word senses in their native representations.

Related papers

Word sense extension [8.939269057094661]
We present a paradigm of word sense extension (WSE) that enables words to spawn new senses toward novel context. We develop a framework that simulates novel word sense extension by partitioning a polysemous word type into two pseudo-tokens that mark its different senses. Our framework combines cognitive models of chaining with a learning scheme that transforms a language model embedding space to support various types of word sense extension.
arXiv Detail & Related papers (2023-06-09T00:54:21Z)
What do Language Models know about word senses? Zero-Shot WSD with Language Models and Domain Inventories [23.623074512572593]
We aim to explore to what extent language models are capable of discerning among senses at inference time. We leverage the relation between word senses and domains, and cast Word Sense Disambiguation (WSD) as a textual entailment problem. Our results show that this approach is indeed effective, close to supervised systems.
arXiv Detail & Related papers (2023-02-07T09:55:07Z)
Subject Verb Agreement Error Patterns in Meaningless Sentences: Humans vs. BERT [64.40111510974957]
We test whether meaning interferes with subject-verb number agreement in English. We generate semantically well-formed and nonsensical items. We find that BERT and humans are both sensitive to our semantic manipulation.
arXiv Detail & Related papers (2022-09-21T17:57:23Z)
Large Scale Substitution-based Word Sense Induction [48.49573297876054]
We present a word-sense induction method based on pre-trained masked language models (MLMs), which can cheaply scale to large vocabularies and large corpora. The result is a corpus which is sense-tagged according to a corpus-derived sense inventory and where each sense is associated with indicative words. Evaluation on English Wikipedia that was sense-tagged using our method shows that both the induced senses, and the per-instance sense assignment, are of high quality even compared to WSD methods, such as Babelfy.
arXiv Detail & Related papers (2021-10-14T19:40:37Z)
Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation. This paper aims to address the issue with a mask-and-predict strategy. We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions. Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z)
Sense representations for Portuguese: experiments with sense embeddings and deep neural language models [0.0]
Unsupervised sense representations can induce different senses of a word by analyzing its contextual semantics in a text. We present the first experiments carried out for generating sense embeddings for Portuguese.
arXiv Detail & Related papers (2021-08-31T18:07:01Z)
EDS-MEMBED: Multi-sense embeddings based on enhanced distributional semantic structures via a graph walk over word senses [0.0]
We leverage the rich semantic structures in WordNet to enhance the quality of multi-sense embeddings. We derive new distributional semantic similarity measures for M-SE from prior ones. We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks.
arXiv Detail & Related papers (2021-02-27T14:36:55Z)
MASKER: Masked Keyword Regularization for Reliable Text Classification [73.90326322794803]
We propose a fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction. MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context. We demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy.
arXiv Detail & Related papers (2020-12-17T04:54:16Z)
Temporal Common Sense Acquisition with Minimal Supervision [77.8308414884754]
This work proposes a novel sequence modeling approach that exploits explicit and implicit mentions of temporal common sense. Our method is shown to give quality predictions of various dimensions of temporal common sense. It also produces representations of events for relevant tasks such as duration comparison, parent-child relations, event coreference and temporal QA.
arXiv Detail & Related papers (2020-05-08T22:20:16Z)
Moving Down the Long Tail of Word Sense Disambiguation with Gloss-Informed Biencoders [79.38278330678965]
A major obstacle in Word Sense Disambiguation (WSD) is that word senses are not uniformly distributed. We propose a bi-encoder model that independently embeds (1) the target word with its surrounding context and (2) the dictionary definition, or gloss, of each sense.
arXiv Detail & Related papers (2020-05-06T04:21:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.