Quantifying the Contextualization of Word Representations with Semantic
Class Probing
- URL: http://arxiv.org/abs/2004.12198v2
- Date: Sun, 11 Oct 2020 12:26:20 GMT
- Title: Quantifying the Contextualization of Word Representations with Semantic
Class Probing
- Authors: Mengjie Zhao, Philipp Dufter, Yadollah Yaghoobzadeh, Hinrich Sch\"utze
- Abstract summary: Pretrained language models have achieved a new state of the art on many NLP tasks, but there are still many open questions about how and why they work so well.
We quantify the amount of contextualization, i.e., how well words are interpreted in context, by studying the extent to which semantic classes of a word can be inferred from its contextualized embeddings.
- Score: 8.401007663676214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pretrained language models have achieved a new state of the art on many NLP
tasks, but there are still many open questions about how and why they work so
well. We investigate the contextualization of words in BERT. We quantify the
amount of contextualization, i.e., how well words are interpreted in context,
by studying the extent to which semantic classes of a word can be inferred from
its contextualized embeddings. Quantifying contextualization helps in
understanding and utilizing pretrained language models. We show that top layer
representations achieve high accuracy inferring semantic classes; that the
strongest contextualization effects occur in the lower layers; that local
context is mostly sufficient for semantic class inference; and that top layer
representations are more task-specific after finetuning while lower layer
representations are more transferable. Finetuning uncovers task related
features, but pretrained knowledge is still largely preserved.
Related papers
- Where does In-context Translation Happen in Large Language Models [18.379840329713407]
We characterize the region where large language models transition from in-text learners to translation models.
We demonstrate evidence of a "task recognition" point where the translation task is encoded into the input representations and attention to context is no longer necessary.
arXiv Detail & Related papers (2024-03-07T14:12:41Z) - Fantastic Semantics and Where to Find Them: Investigating Which Layers of Generative LLMs Reflect Lexical Semantics [50.982315553104975]
We investigate the bottom-up evolution of lexical semantics for a popular large language model, namely Llama2.
Our experiments show that the representations in lower layers encode lexical semantics, while the higher layers, with weaker semantic induction, are responsible for prediction.
This is in contrast to models with discriminative objectives, such as mask language modeling, where the higher layers obtain better lexical semantics.
arXiv Detail & Related papers (2024-03-03T13:14:47Z) - Breaking Down Word Semantics from Pre-trained Language Models through
Layer-wise Dimension Selection [0.0]
This paper aims to disentangle semantic sense from BERT by applying a binary mask to middle outputs across the layers.
The disentangled embeddings are evaluated through binary classification to determine if the target word in two different sentences has the same meaning.
arXiv Detail & Related papers (2023-10-08T11:07:19Z) - SensePOLAR: Word sense aware interpretability for pre-trained contextual
word embeddings [4.479834103607384]
Adding interpretability to word embeddings represents an area of active research in text representation.
We present SensePOLAR, an extension of the original POLAR framework that enables word-sense aware interpretability for pre-trained contextual word embeddings.
arXiv Detail & Related papers (2023-01-11T20:25:53Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - UCPhrase: Unsupervised Context-aware Quality Phrase Tagging [63.86606855524567]
UCPhrase is a novel unsupervised context-aware quality phrase tagger.
We induce high-quality phrase spans as silver labels from consistently co-occurring word sequences.
We show that our design is superior to state-of-the-art pre-trained, unsupervised, and distantly supervised methods.
arXiv Detail & Related papers (2021-05-28T19:44:24Z) - EDS-MEMBED: Multi-sense embeddings based on enhanced distributional
semantic structures via a graph walk over word senses [0.0]
We leverage the rich semantic structures in WordNet to enhance the quality of multi-sense embeddings.
We derive new distributional semantic similarity measures for M-SE from prior ones.
We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks.
arXiv Detail & Related papers (2021-02-27T14:36:55Z) - Accurate Word Representations with Universal Visual Guidance [55.71425503859685]
This paper proposes a visual representation method to explicitly enhance conventional word embedding with multiple-aspect senses from visual guidance.
We build a small-scale word-image dictionary from a multimodal seed dataset where each word corresponds to diverse related images.
Experiments on 12 natural language understanding and machine translation tasks further verify the effectiveness and the generalization capability of the proposed approach.
arXiv Detail & Related papers (2020-12-30T09:11:50Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Contextual Embeddings: When Are They Worth It? [14.582968294755794]
We study the settings for which deep contextual embeddings give large improvements in performance relative to classic pretrained embeddings.
We find that both of these simpler baselines can match contextual embeddings on industry-scale data.
We identify properties of data for which contextual embeddings give particularly large gains: language containing complex structure, ambiguous word usage, and words unseen in training.
arXiv Detail & Related papers (2020-05-18T22:20:17Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.