Quantifying the Contextualization of Word Representations with Semantic
Class Probing
- URL: http://arxiv.org/abs/2004.12198v2
- Date: Sun, 11 Oct 2020 12:26:20 GMT
- Title: Quantifying the Contextualization of Word Representations with Semantic
Class Probing
- Authors: Mengjie Zhao, Philipp Dufter, Yadollah Yaghoobzadeh, Hinrich Sch\"utze
- Abstract summary: Pretrained language models have achieved a new state of the art on many NLP tasks, but there are still many open questions about how and why they work so well.
We quantify the amount of contextualization, i.e., how well words are interpreted in context, by studying the extent to which semantic classes of a word can be inferred from its contextualized embeddings.
- Score: 8.401007663676214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pretrained language models have achieved a new state of the art on many NLP
tasks, but there are still many open questions about how and why they work so
well. We investigate the contextualization of words in BERT. We quantify the
amount of contextualization, i.e., how well words are interpreted in context,
by studying the extent to which semantic classes of a word can be inferred from
its contextualized embeddings. Quantifying contextualization helps in
understanding and utilizing pretrained language models. We show that top layer
representations achieve high accuracy inferring semantic classes; that the
strongest contextualization effects occur in the lower layers; that local
context is mostly sufficient for semantic class inference; and that top layer
representations are more task-specific after finetuning while lower layer
representations are more transferable. Finetuning uncovers task related
features, but pretrained knowledge is still largely preserved.
Related papers
- Manual Verbalizer Enrichment for Few-Shot Text Classification [1.860409237919611]
acrshortmave is an approach for verbalizer construction by enrichment of class labels.
Our model achieves state-of-the-art results while using significantly fewer resources.
arXiv Detail & Related papers (2024-10-08T16:16:47Z) - Probing Context Localization of Polysemous Words in Pre-trained Language Model Sub-Layers [12.610445666406898]
We investigate the degree of contextualization encoded in the fine-grained sub-layer representations of a Pre-trained Language Model (PLM)
To identify the main contributions of sub-layers to contextualisation, we first extract the sub-layer representations of polysemous words in minimally different sentence pairs.
We also try to empirically localize the strength of contextualization information encoded in these sub-layer representations.
arXiv Detail & Related papers (2024-09-21T10:42:07Z) - Where does In-context Translation Happen in Large Language Models [18.379840329713407]
We characterize the region where large language models transition from in-text learners to translation models.
We demonstrate evidence of a "task recognition" point where the translation task is encoded into the input representations and attention to context is no longer necessary.
arXiv Detail & Related papers (2024-03-07T14:12:41Z) - Fantastic Semantics and Where to Find Them: Investigating Which Layers of Generative LLMs Reflect Lexical Semantics [50.982315553104975]
We investigate the bottom-up evolution of lexical semantics for a popular large language model, namely Llama2.
Our experiments show that the representations in lower layers encode lexical semantics, while the higher layers, with weaker semantic induction, are responsible for prediction.
This is in contrast to models with discriminative objectives, such as mask language modeling, where the higher layers obtain better lexical semantics.
arXiv Detail & Related papers (2024-03-03T13:14:47Z) - Breaking Down Word Semantics from Pre-trained Language Models through
Layer-wise Dimension Selection [0.0]
This paper aims to disentangle semantic sense from BERT by applying a binary mask to middle outputs across the layers.
The disentangled embeddings are evaluated through binary classification to determine if the target word in two different sentences has the same meaning.
arXiv Detail & Related papers (2023-10-08T11:07:19Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - UCPhrase: Unsupervised Context-aware Quality Phrase Tagging [63.86606855524567]
UCPhrase is a novel unsupervised context-aware quality phrase tagger.
We induce high-quality phrase spans as silver labels from consistently co-occurring word sequences.
We show that our design is superior to state-of-the-art pre-trained, unsupervised, and distantly supervised methods.
arXiv Detail & Related papers (2021-05-28T19:44:24Z) - Accurate Word Representations with Universal Visual Guidance [55.71425503859685]
This paper proposes a visual representation method to explicitly enhance conventional word embedding with multiple-aspect senses from visual guidance.
We build a small-scale word-image dictionary from a multimodal seed dataset where each word corresponds to diverse related images.
Experiments on 12 natural language understanding and machine translation tasks further verify the effectiveness and the generalization capability of the proposed approach.
arXiv Detail & Related papers (2020-12-30T09:11:50Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Contextual Embeddings: When Are They Worth It? [14.582968294755794]
We study the settings for which deep contextual embeddings give large improvements in performance relative to classic pretrained embeddings.
We find that both of these simpler baselines can match contextual embeddings on industry-scale data.
We identify properties of data for which contextual embeddings give particularly large gains: language containing complex structure, ambiguous word usage, and words unseen in training.
arXiv Detail & Related papers (2020-05-18T22:20:17Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.