Probing Context Localization of Polysemous Words in Pre-trained Language Model Sub-Layers
- URL: http://arxiv.org/abs/2409.14097v1
- Date: Sat, 21 Sep 2024 10:42:07 GMT
- Title: Probing Context Localization of Polysemous Words in Pre-trained Language Model Sub-Layers
- Authors: Soniya Vijayakumar, Josef van Genabith, Simon Ostermann,
- Abstract summary: We investigate the degree of contextualization encoded in the fine-grained sub-layer representations of a Pre-trained Language Model (PLM)
To identify the main contributions of sub-layers to contextualisation, we first extract the sub-layer representations of polysemous words in minimally different sentence pairs.
We also try to empirically localize the strength of contextualization information encoded in these sub-layer representations.
- Score: 12.610445666406898
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the era of high performing Large Language Models, researchers have widely acknowledged that contextual word representations are one of the key drivers in achieving top performances in downstream tasks. In this work, we investigate the degree of contextualization encoded in the fine-grained sub-layer representations of a Pre-trained Language Model (PLM) by empirical experiments using linear probes. Unlike previous work, we are particularly interested in identifying the strength of contextualization across PLM sub-layer representations (i.e. Self-Attention, Feed-Forward Activation and Output sub-layers). To identify the main contributions of sub-layers to contextualisation, we first extract the sub-layer representations of polysemous words in minimally different sentence pairs, and compare how these representations change through the forward pass of the PLM network. Second, by probing on a sense identification classification task, we try to empirically localize the strength of contextualization information encoded in these sub-layer representations. With these probing experiments, we also try to gain a better understanding of the influence of context length and context richness on the degree of contextualization. Our main conclusion is cautionary: BERT demonstrates a high degree of contextualization in the top sub-layers if the word in question is in a specific position in the sentence with a shorter context window, but this does not systematically generalize across different word positions and context sizes.
Related papers
- Where does In-context Translation Happen in Large Language Models [18.379840329713407]
We characterize the region where large language models transition from in-text learners to translation models.
We demonstrate evidence of a "task recognition" point where the translation task is encoded into the input representations and attention to context is no longer necessary.
arXiv Detail & Related papers (2024-03-07T14:12:41Z) - LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained
Descriptors [58.75140338866403]
DVDet is a Descriptor-Enhanced Open Vocabulary Detector.
It transforms regional embeddings into image-like representations that can be directly integrated into general open vocabulary detection training.
Extensive experiments over multiple large-scale benchmarks show that DVDet outperforms the state-of-the-art consistently by large margins.
arXiv Detail & Related papers (2024-02-07T07:26:49Z) - Analyzing Text Representations by Measuring Task Alignment [2.198430261120653]
We develop a task alignment score based on hierarchical clustering that measures alignment at different levels of granularity.
Our experiments on text classification validate our hypothesis by showing that task alignment can explain the classification performance of a given representation.
arXiv Detail & Related papers (2023-05-31T11:20:48Z) - Sentiment-Aware Word and Sentence Level Pre-training for Sentiment
Analysis [64.70116276295609]
SentiWSP is a Sentiment-aware pre-trained language model with combined Word-level and Sentence-level Pre-training tasks.
SentiWSP achieves new state-of-the-art performance on various sentence-level and aspect-level sentiment classification benchmarks.
arXiv Detail & Related papers (2022-10-18T12:25:29Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - An Empirical Study on Leveraging Position Embeddings for Target-oriented
Opinion Words Extraction [13.765146062545048]
Target-oriented opinion words extraction (TOWE) is a new subtask of target-oriented sentiment analysis.
We show that BiLSTM-based models can effectively encode position information into word representations.
We also adapt a graph convolutional network (GCN) to enhance word representations by incorporating syntactic information.
arXiv Detail & Related papers (2021-09-02T22:49:45Z) - Effect of Post-processing on Contextualized Word Representations [20.856802441794162]
Post-processing of static embedding has beenshown to improve their performance on both lexical and sequence-level tasks.
We question the usefulness of post-processing for contextualized embeddings obtained from different layers of pre-trained language models.
arXiv Detail & Related papers (2021-04-15T13:40:42Z) - Accurate Word Representations with Universal Visual Guidance [55.71425503859685]
This paper proposes a visual representation method to explicitly enhance conventional word embedding with multiple-aspect senses from visual guidance.
We build a small-scale word-image dictionary from a multimodal seed dataset where each word corresponds to diverse related images.
Experiments on 12 natural language understanding and machine translation tasks further verify the effectiveness and the generalization capability of the proposed approach.
arXiv Detail & Related papers (2020-12-30T09:11:50Z) - BURT: BERT-inspired Universal Representation from Twin Structure [89.82415322763475]
BURT (BERT inspired Universal Representation from Twin Structure) is capable of generating universal, fixed-size representations for input sequences of any granularity.
Our proposed BURT adopts the Siamese network, learning sentence-level representations from natural language inference dataset and word/phrase-level representations from paraphrasing dataset.
We evaluate BURT across different granularities of text similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly used word similarity tasks.
arXiv Detail & Related papers (2020-04-29T04:01:52Z) - Quantifying the Contextualization of Word Representations with Semantic
Class Probing [8.401007663676214]
Pretrained language models have achieved a new state of the art on many NLP tasks, but there are still many open questions about how and why they work so well.
We quantify the amount of contextualization, i.e., how well words are interpreted in context, by studying the extent to which semantic classes of a word can be inferred from its contextualized embeddings.
arXiv Detail & Related papers (2020-04-25T17:49:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.