Non-Parametric Few-Shot Learning for Word Sense Disambiguation
- URL: http://arxiv.org/abs/2104.12677v2
- Date: Tue, 27 Apr 2021 13:28:37 GMT
- Title: Non-Parametric Few-Shot Learning for Word Sense Disambiguation
- Authors: Howard Chen, Mengzhou Xia, and Danqi Chen
- Abstract summary: MetricWSD is a non-parametric few-shot learning approach to mitigate this data imbalance issue.
By learning to compute distances among the senses of a given word through episodic training, MetricWSD transfers knowledge from high-frequency words to infrequent ones.
- Score: 11.175893018731712
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Word sense disambiguation (WSD) is a long-standing problem in natural
language processing. One significant challenge in supervised all-words WSD is
to classify among senses for a majority of words that lie in the long-tail
distribution. For instance, 84% of the annotated words have less than 10
examples in the SemCor training data. This issue is more pronounced as the
imbalance occurs in both word and sense distributions. In this work, we propose
MetricWSD, a non-parametric few-shot learning approach to mitigate this data
imbalance issue. By learning to compute distances among the senses of a given
word through episodic training, MetricWSD transfers knowledge (a learned metric
space) from high-frequency words to infrequent ones. MetricWSD constructs the
training episodes tailored to word frequencies and explicitly addresses the
problem of the skewed distribution, as opposed to mixing all the words trained
with parametric models in previous work. Without resorting to any lexical
resources, MetricWSD obtains strong performance against parametric
alternatives, achieving a 75.1 F1 score on the unified WSD evaluation benchmark
(Raganato et al., 2017b). Our analysis further validates that infrequent words
and senses enjoy significant improvement.
Related papers
- An Effective Context-Balanced Adaptation Approach for Long-Tailed Speech Recognition [10.234673954430221]
We study the impact of altering the context list to have words with different frequency distributions on model performance.
A series of experiments conducted on the AISHELL-1 benchmark dataset suggests that using all vocabulary words from the training corpus as the context list and pairing them with our balanced objective yields the best performance.
arXiv Detail & Related papers (2024-09-10T12:52:36Z) - Syntactic Language Change in English and German: Metrics, Parsers, and Convergences [56.47832275431858]
The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years.
We base our observations on five dependencys, including the widely used Stanford Core as well as 4 newer alternatives.
We show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions.
arXiv Detail & Related papers (2024-02-18T11:46:16Z) - Improving Language Models Meaning Understanding and Consistency by
Learning Conceptual Roles from Dictionary [65.268245109828]
Non-human-like behaviour of contemporary pre-trained language models (PLMs) is a leading cause undermining their trustworthiness.
A striking phenomenon is the generation of inconsistent predictions, which produces contradictory results.
We propose a practical approach that alleviates the inconsistent behaviour issue by improving PLM awareness.
arXiv Detail & Related papers (2023-10-24T06:15:15Z) - Bridging the Training-Inference Gap for Dense Phrase Retrieval [104.4836127502683]
Building dense retrievers requires a series of standard procedures, including training and validating neural models.
In this paper, we explore how the gap between training and inference in dense retrieval can be reduced.
We propose an efficient way of validating dense retrievers using a small subset of the entire corpus.
arXiv Detail & Related papers (2022-10-25T00:53:06Z) - PiC: A Phrase-in-Context Dataset for Phrase Understanding and Semantic
Search [25.801066428860242]
We propose PiC - a dataset of 28K of noun phrases accompanied by their contextual Wikipedia pages.
We find that training on our dataset improves ranking models' accuracy and remarkably pushes Question Answering (QA) models to near-human accuracy.
arXiv Detail & Related papers (2022-07-19T04:45:41Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Meta-Learning with Variational Semantic Memory for Word Sense
Disambiguation [56.830395467247016]
We propose a model of semantic memory for WSD in a meta-learning setting.
Our model is based on hierarchical variational inference and incorporates an adaptive memory update rule via a hypernetwork.
We show our model advances the state of the art in few-shot WSD, supports effective learning in extremely data scarce scenarios.
arXiv Detail & Related papers (2021-06-05T20:40:01Z) - Learning to Remove: Towards Isotropic Pre-trained BERT Embedding [7.765987411382461]
Research in word representation shows that isotropic embeddings can significantly improve performance on downstream tasks.
We measure and analyze the geometry of pre-trained BERT embedding and find that it is far from isotropic.
We propose a simple, and yet effective method to fix this problem: remove several dominant directions of BERT embedding with a set of learnable weights.
arXiv Detail & Related papers (2021-04-12T08:13:59Z) - EDS-MEMBED: Multi-sense embeddings based on enhanced distributional
semantic structures via a graph walk over word senses [0.0]
We leverage the rich semantic structures in WordNet to enhance the quality of multi-sense embeddings.
We derive new distributional semantic similarity measures for M-SE from prior ones.
We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks.
arXiv Detail & Related papers (2021-02-27T14:36:55Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.