Related papers: Fantastic Semantics and Where to Find Them: Investigating Which Layers of Generative LLMs Reflect Lexical Semantics

Fantastic Semantics and Where to Find Them: Investigating Which Layers of Generative LLMs Reflect Lexical Semantics

URL: http://arxiv.org/abs/2403.01509v2
Date: Sun, 9 Jun 2024 13:07:50 GMT
Title: Fantastic Semantics and Where to Find Them: Investigating Which Layers of Generative LLMs Reflect Lexical Semantics
Authors: Zhu Liu, Cunliang Kong, Ying Liu, Maosong Sun,
Abstract summary: We investigate the bottom-up evolution of lexical semantics for a popular large language model, namely Llama2. Our experiments show that the representations in lower layers encode lexical semantics, while the higher layers, with weaker semantic induction, are responsible for prediction. This is in contrast to models with discriminative objectives, such as mask language modeling, where the higher layers obtain better lexical semantics.
Score: 50.982315553104975
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large language models have achieved remarkable success in general language understanding tasks. However, as a family of generative methods with the objective of next token prediction, the semantic evolution with the depth of these models are not fully explored, unlike their predecessors, such as BERT-like architectures. In this paper, we specifically investigate the bottom-up evolution of lexical semantics for a popular LLM, namely Llama2, by probing its hidden states at the end of each layer using a contextualized word identification task. Our experiments show that the representations in lower layers encode lexical semantics, while the higher layers, with weaker semantic induction, are responsible for prediction. This is in contrast to models with discriminative objectives, such as mask language modeling, where the higher layers obtain better lexical semantics. The conclusion is further supported by the monotonic increase in performance via the hidden states for the last meaningless symbols, such as punctuation, in the prompting strategy. Our codes are available at https://github.com/RyanLiut/LLM_LexSem.

Related papers

Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs) We form "semantic tokens" by merging the semantically similar subwords and their embeddings. inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z)
FIRP: Faster LLM inference via future intermediate representation prediction [54.897493351694195]
FIRP generates multiple tokens instead of one at each decoding step. We conduct extensive experiments, showing a speedup ratio of 1.9x-3x in several models and datasets.
arXiv Detail & Related papers (2024-10-27T15:53:49Z)
Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models [1.534667887016089]
We show that intermediate hidden states extracted from large language models are able to predict measured brain response to natural language stimuli. We also demonstrate a strong correspondence between layerwise encoding performance and the intrinsic dimensionality of representations from LLMs.
arXiv Detail & Related papers (2024-09-09T16:33:16Z)
Breaking Down Word Semantics from Pre-trained Language Models through Layer-wise Dimension Selection [0.0]
This paper aims to disentangle semantic sense from BERT by applying a binary mask to middle outputs across the layers. The disentangled embeddings are evaluated through binary classification to determine if the target word in two different sentences has the same meaning.
arXiv Detail & Related papers (2023-10-08T11:07:19Z)
Don't Judge a Language Model by Its Last Layer: Contrastive Learning with Layer-Wise Attention Pooling [6.501126898523172]
Recent pre-trained language models (PLMs) achieved great success on many natural language processing tasks through learning linguistic features and contextualized sentence representation. This paper introduces the attention-based pooling strategy, which enables the model to preserve layer-wise signals captured in each layer and learn digested linguistic features for downstream tasks.
arXiv Detail & Related papers (2022-09-13T13:09:49Z)
Integrating Language Guidance into Vision-based Deep Metric Learning [78.18860829585182]
We propose to learn metric spaces which encode semantic similarities as embedding space. These spaces should be transferable to classes beyond those seen during training. This causes learned embedding spaces to encode incomplete semantic context and misrepresent the semantic relation between classes.
arXiv Detail & Related papers (2022-03-16T11:06:50Z)
Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models. We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z)
Probing Pretrained Language Models for Lexical Semantics [76.73599166020307]
We present a systematic empirical analysis across six typologically diverse languages and five different lexical tasks. Our results indicate patterns and best practices that hold universally, but also point to prominent variations across languages and tasks.
arXiv Detail & Related papers (2020-10-12T14:24:01Z)
Quantifying the Contextualization of Word Representations with Semantic Class Probing [8.401007663676214]
Pretrained language models have achieved a new state of the art on many NLP tasks, but there are still many open questions about how and why they work so well. We quantify the amount of contextualization, i.e., how well words are interpreted in context, by studying the extent to which semantic classes of a word can be inferred from its contextualized embeddings.
arXiv Detail & Related papers (2020-04-25T17:49:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.