Training Bi-Encoders for Word Sense Disambiguation
- URL: http://arxiv.org/abs/2105.10146v1
- Date: Fri, 21 May 2021 06:06:03 GMT
- Title: Training Bi-Encoders for Word Sense Disambiguation
- Authors: Harsh Kohli
- Abstract summary: State-of-the-art approaches in Word Sense Disambiguation leverage lexical information along with pre-trained embeddings from these models to achieve results comparable to human inter-annotator agreement on standard evaluation benchmarks.
We further the state of the art in Word Sense Disambiguation through our multi-stage pre-training and fine-tuning pipeline.
- Score: 4.149972584899897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern transformer-based neural architectures yield impressive results in
nearly every NLP task and Word Sense Disambiguation, the problem of discerning
the correct sense of a word in a given context, is no exception.
State-of-the-art approaches in WSD today leverage lexical information along
with pre-trained embeddings from these models to achieve results comparable to
human inter-annotator agreement on standard evaluation benchmarks. In the same
vein, we experiment with several strategies to optimize bi-encoders for this
specific task and propose alternative methods of presenting lexical information
to our model. Through our multi-stage pre-training and fine-tuning pipeline we
further the state of the art in Word Sense Disambiguation.
Related papers
- Beyond Coarse-Grained Matching in Video-Text Retrieval [50.799697216533914]
We introduce a new approach for fine-grained evaluation.
Our approach can be applied to existing datasets by automatically generating hard negative test captions.
Experiments on our fine-grained evaluations demonstrate that this approach enhances a model's ability to understand fine-grained differences.
arXiv Detail & Related papers (2024-10-16T09:42:29Z) - Language Model Decoding as Direct Metrics Optimization [87.68281625776282]
Current decoding methods struggle to generate texts that align with human texts across different aspects.
In this work, we frame decoding from a language model as an optimization problem with the goal of strictly matching the expected performance with human texts.
We prove that this induced distribution is guaranteed to improve the perplexity on human texts, which suggests a better approximation to the underlying distribution of human texts.
arXiv Detail & Related papers (2023-10-02T09:35:27Z) - Word Sense Induction with Knowledge Distillation from BERT [6.88247391730482]
This paper proposes a method to distill multiple word senses from a pre-trained language model (BERT) by using attention over the senses of a word in a context.
Experiments on the contextual word similarity and sense induction tasks show that this method is superior to or competitive with state-of-the-art multi-sense embeddings.
arXiv Detail & Related papers (2023-04-20T21:05:35Z) - HanoiT: Enhancing Context-aware Translation via Selective Context [95.93730812799798]
Context-aware neural machine translation aims to use the document-level context to improve translation quality.
The irrelevant or trivial words may bring some noise and distract the model from learning the relationship between the current sentence and the auxiliary context.
We propose a novel end-to-end encoder-decoder model with a layer-wise selection mechanism to sift and refine the long document context.
arXiv Detail & Related papers (2023-01-17T12:07:13Z) - Hierarchical Sketch Induction for Paraphrase Generation [79.87892048285819]
We introduce Hierarchical Refinement Quantized Variational Autoencoders (HRQ-VAE), a method for learning decompositions of dense encodings.
We use HRQ-VAE to encode the syntactic form of an input sentence as a path through the hierarchy, allowing us to more easily predict syntactic sketches at test time.
arXiv Detail & Related papers (2022-03-07T15:28:36Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Semantically Distributed Robust Optimization for Vision-and-Language
Inference [34.83271008148651]
We present textbfSDRO, a model-agnostic method that utilizes a set linguistic transformations in a distributed robust optimization setting.
Experiments on benchmark datasets with images and video demonstrate performance improvements as well as robustness to adversarial attacks.
arXiv Detail & Related papers (2021-10-14T06:02:46Z) - Obtaining Better Static Word Embeddings Using Contextual Embedding
Models [53.86080627007695]
Our proposed distillation method is a simple extension of CBOW-based training.
As a side-effect, our approach also allows a fair comparison of both contextual and static embeddings.
arXiv Detail & Related papers (2021-06-08T12:59:32Z) - VCDM: Leveraging Variational Bi-encoding and Deep Contextualized Word
Representations for Improved Definition Modeling [24.775371434410328]
We tackle the task of definition modeling, where the goal is to learn to generate definitions of words and phrases.
Existing approaches for this task are discriminative, combining distributional and lexical semantics in an implicit rather than direct way.
We propose a generative model for the task, introducing a continuous latent variable to explicitly model the underlying relationship between a phrase used within a context and its definition.
arXiv Detail & Related papers (2020-10-07T02:48:44Z) - Analysis and Evaluation of Language Models for Word Sense Disambiguation [18.001457030065712]
Transformer-based language models have taken many fields in NLP by storm.
BERT can accurately capture high-level sense distinctions, even when a limited number of examples is available for each word sense.
BERT and its derivatives dominate most of the existing evaluation benchmarks.
arXiv Detail & Related papers (2020-08-26T15:07:07Z) - Deep learning models for representing out-of-vocabulary words [1.4502611532302039]
We present a performance evaluation of deep learning models for representing out-of-vocabulary (OOV) words.
Although the best technique for handling OOV words is different for each task, Comick, a deep learning method that infers the embedding based on the context and the morphological structure of the OOV word, obtained promising results.
arXiv Detail & Related papers (2020-07-14T19:31:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.