Fantastic Semantics and Where to Find Them: Investigating Which Layers of Generative LLMs Reflect Lexical Semantics
- URL: http://arxiv.org/abs/2403.01509v2
- Date: Sun, 9 Jun 2024 13:07:50 GMT
- Title: Fantastic Semantics and Where to Find Them: Investigating Which Layers of Generative LLMs Reflect Lexical Semantics
- Authors: Zhu Liu, Cunliang Kong, Ying Liu, Maosong Sun,
- Abstract summary: We investigate the bottom-up evolution of lexical semantics for a popular large language model, namely Llama2.
Our experiments show that the representations in lower layers encode lexical semantics, while the higher layers, with weaker semantic induction, are responsible for prediction.
This is in contrast to models with discriminative objectives, such as mask language modeling, where the higher layers obtain better lexical semantics.
- Score: 50.982315553104975
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large language models have achieved remarkable success in general language understanding tasks. However, as a family of generative methods with the objective of next token prediction, the semantic evolution with the depth of these models are not fully explored, unlike their predecessors, such as BERT-like architectures. In this paper, we specifically investigate the bottom-up evolution of lexical semantics for a popular LLM, namely Llama2, by probing its hidden states at the end of each layer using a contextualized word identification task. Our experiments show that the representations in lower layers encode lexical semantics, while the higher layers, with weaker semantic induction, are responsible for prediction. This is in contrast to models with discriminative objectives, such as mask language modeling, where the higher layers obtain better lexical semantics. The conclusion is further supported by the monotonic increase in performance via the hidden states for the last meaningless symbols, such as punctuation, in the prompting strategy. Our codes are available at https://github.com/RyanLiut/LLM_LexSem.
Related papers
- Next Semantic Scale Prediction via Hierarchical Diffusion Language Models [31.562156001116673]
HDLM builds on a hierarchical vocabulary where low-level tokens with detailed semantics are surjectively mapped to high-level tokens with coarse-grained meanings.<n>In the forward process, each token is independently perturbed to its higher-level ancestor with more abstract semantics according to the scheduler.<n>In the reverse process the model progressively predicts the next, more detailed semantics.
arXiv Detail & Related papers (2025-10-08T17:41:12Z) - Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation [59.40886078302025]
Multimodal large language models (MLLMs) have demonstrated remarkable capabilities in aligning visual inputs with natural language outputs.<n>Yet, the extent to which generated tokens depend on visual modalities remains poorly understood.<n>We present a lightweight black-box framework for explaining autoregressive token generation in MLLMs.
arXiv Detail & Related papers (2025-09-26T15:38:42Z) - Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth [21.092167028989632]
Drivelology is a linguistic phenomenon characterised by "nonsense with depth"<n>We construct a benchmark dataset of over 1,200+ meticulously curated and diverse examples across English, Mandarin, Spanish, French, Japanese, and Korean.<n>We find that current large language models (LLMs) consistently fail to grasp the layered semantics of Drivelological text.
arXiv Detail & Related papers (2025-09-04T03:58:55Z) - Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs)
We form "semantic tokens" by merging the semantically similar subwords and their embeddings.
inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z) - FIRP: Faster LLM inference via future intermediate representation prediction [54.897493351694195]
FIRP generates multiple tokens instead of one at each decoding step.
We conduct extensive experiments, showing a speedup ratio of 1.9x-3x in several models and datasets.
arXiv Detail & Related papers (2024-10-27T15:53:49Z) - Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models [1.534667887016089]
We show that intermediate hidden states extracted from large language models are able to predict measured brain response to natural language stimuli.
We also demonstrate a strong correspondence between layerwise encoding performance and the intrinsic dimensionality of representations from LLMs.
arXiv Detail & Related papers (2024-09-09T16:33:16Z) - Breaking Down Word Semantics from Pre-trained Language Models through
Layer-wise Dimension Selection [0.0]
This paper aims to disentangle semantic sense from BERT by applying a binary mask to middle outputs across the layers.
The disentangled embeddings are evaluated through binary classification to determine if the target word in two different sentences has the same meaning.
arXiv Detail & Related papers (2023-10-08T11:07:19Z) - Don't Judge a Language Model by Its Last Layer: Contrastive Learning
with Layer-Wise Attention Pooling [6.501126898523172]
Recent pre-trained language models (PLMs) achieved great success on many natural language processing tasks through learning linguistic features and contextualized sentence representation.
This paper introduces the attention-based pooling strategy, which enables the model to preserve layer-wise signals captured in each layer and learn digested linguistic features for downstream tasks.
arXiv Detail & Related papers (2022-09-13T13:09:49Z) - Integrating Language Guidance into Vision-based Deep Metric Learning [78.18860829585182]
We propose to learn metric spaces which encode semantic similarities as embedding space.
These spaces should be transferable to classes beyond those seen during training.
This causes learned embedding spaces to encode incomplete semantic context and misrepresent the semantic relation between classes.
arXiv Detail & Related papers (2022-03-16T11:06:50Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Probing Pretrained Language Models for Lexical Semantics [76.73599166020307]
We present a systematic empirical analysis across six typologically diverse languages and five different lexical tasks.
Our results indicate patterns and best practices that hold universally, but also point to prominent variations across languages and tasks.
arXiv Detail & Related papers (2020-10-12T14:24:01Z) - Quantifying the Contextualization of Word Representations with Semantic
Class Probing [8.401007663676214]
Pretrained language models have achieved a new state of the art on many NLP tasks, but there are still many open questions about how and why they work so well.
We quantify the amount of contextualization, i.e., how well words are interpreted in context, by studying the extent to which semantic classes of a word can be inferred from its contextualized embeddings.
arXiv Detail & Related papers (2020-04-25T17:49:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.