A Comparative Study of Transformers on Word Sense Disambiguation
- URL: http://arxiv.org/abs/2111.15417v1
- Date: Tue, 30 Nov 2021 14:10:22 GMT
- Title: A Comparative Study of Transformers on Word Sense Disambiguation
- Authors: Avi Chawla and Nidhi Mulay and Vikas Bishnoi and Gaurav Dhama and Dr.
Anil Kumar Singh
- Abstract summary: This paper presents a comparative study on the contextualization power of neural network-based embedding systems.
We evaluate their contextualization power using two sample Word Sense Disambiguation (WSD) tasks, SensEval-2 and SensEval-3.
Experimental results show that the proposed techniques also achieve superior results over the current state-of-the-art on both the WSD tasks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent years of research in Natural Language Processing (NLP) have witnessed
dramatic growth in training large models for generating context-aware language
representations. In this regard, numerous NLP systems have leveraged the power
of neural network-based architectures to incorporate sense information in
embeddings, resulting in Contextualized Word Embeddings (CWEs). Despite this
progress, the NLP community has not witnessed any significant work performing a
comparative study on the contextualization power of such architectures. This
paper presents a comparative study and an extensive analysis of nine widely
adopted Transformer models. These models are BERT, CTRL, DistilBERT,
OpenAI-GPT, OpenAI-GPT2, Transformer-XL, XLNet, ELECTRA, and ALBERT. We
evaluate their contextualization power using two lexical sample Word Sense
Disambiguation (WSD) tasks, SensEval-2 and SensEval-3. We adopt a simple yet
effective approach to WSD that uses a k-Nearest Neighbor (kNN) classification
on CWEs. Experimental results show that the proposed techniques also achieve
superior results over the current state-of-the-art on both the WSD tasks
Related papers
- Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - Enhancing Modern Supervised Word Sense Disambiguation Models by Semantic
Lexical Resources [11.257738983764499]
Supervised models for Word Sense Disambiguation (WSD) currently yield to state-of-the-art results in the most popular benchmarks.
We enhance "modern" supervised WSD models exploiting two popular SLRs: WordNet and WordNet Domains.
We study the effect of different types of semantic features, investigating their interaction with local contexts encoded by means of mixtures of Word Embeddings or Recurrent Neural Networks.
arXiv Detail & Related papers (2024-02-20T13:47:51Z) - A Cohesive Distillation Architecture for Neural Language Models [0.0]
A recent trend in Natural Language Processing is the exponential growth in Language Model (LM) size.
This study investigates methods for Knowledge Distillation (KD) to provide efficient alternatives to large-scale models.
arXiv Detail & Related papers (2023-01-12T08:01:53Z) - INTERACTION: A Generative XAI Framework for Natural Language Inference
Explanations [58.062003028768636]
Current XAI approaches only focus on delivering a single explanation.
This paper proposes a generative XAI framework, INTERACTION (explaIn aNd predicT thEn queRy with contextuAl CondiTional varIational autO-eNcoder)
Our novel framework presents explanation in two steps: (step one) Explanation and Label Prediction; and (step two) Diverse Evidence Generation.
arXiv Detail & Related papers (2022-09-02T13:52:39Z) - Exploring Dimensionality Reduction Techniques in Multilingual
Transformers [64.78260098263489]
This paper gives a comprehensive account of the impact of dimensional reduction techniques on the performance of state-of-the-art multilingual Siamese Transformers.
It shows that it is possible to achieve an average reduction in the number of dimensions of $91.58% pm 2.59%$ and $54.65% pm 32.20%$, respectively.
arXiv Detail & Related papers (2022-04-18T17:20:55Z) - Obtaining Better Static Word Embeddings Using Contextual Embedding
Models [53.86080627007695]
Our proposed distillation method is a simple extension of CBOW-based training.
As a side-effect, our approach also allows a fair comparison of both contextual and static embeddings.
arXiv Detail & Related papers (2021-06-08T12:59:32Z) - Training Bi-Encoders for Word Sense Disambiguation [4.149972584899897]
State-of-the-art approaches in Word Sense Disambiguation leverage lexical information along with pre-trained embeddings from these models to achieve results comparable to human inter-annotator agreement on standard evaluation benchmarks.
We further the state of the art in Word Sense Disambiguation through our multi-stage pre-training and fine-tuning pipeline.
arXiv Detail & Related papers (2021-05-21T06:06:03Z) - The NLP Cookbook: Modern Recipes for Transformer based Deep Learning
Architectures [0.0]
Natural Language Processing models have achieved phenomenal success in linguistic and semantic tasks.
Recent NLP architectures have utilized concepts of transfer learning, pruning, quantization, and knowledge distillation to achieve moderate model sizes.
Knowledge Retrievers have been built to extricate explicit data documents from a large corpus of databases with greater efficiency and accuracy.
arXiv Detail & Related papers (2021-03-23T22:38:20Z) - Enriching Non-Autoregressive Transformer with Syntactic and
SemanticStructures for Neural Machine Translation [54.864148836486166]
We propose to incorporate the explicit syntactic and semantic structures of languages into a non-autoregressive Transformer.
Our model achieves a significantly faster speed, as well as keeps the translation quality when compared with several state-of-the-art non-autoregressive models.
arXiv Detail & Related papers (2021-01-22T04:12:17Z) - A Comparative Study of Lexical Substitution Approaches based on Neural
Language Models [117.96628873753123]
We present a large-scale comparative study of popular neural language and masked language models.
We show that already competitive results achieved by SOTA LMs/MLMs can be further improved if information about the target word is injected properly.
arXiv Detail & Related papers (2020-05-29T18:43:22Z) - An Evaluation of Recent Neural Sequence Tagging Models in Turkish Named
Entity Recognition [5.161531917413708]
We propose a transformer-based network with a conditional random field layer that leads to the state-of-the-art result.
Our study contributes to the literature that quantifies the impact of transfer learning on processing morphologically rich languages.
arXiv Detail & Related papers (2020-05-14T06:54:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.