Identifying and Analyzing Performance-Critical Tokens in Large Language Models
- URL: http://arxiv.org/abs/2401.11323v3
- Date: Mon, 24 Feb 2025 03:35:56 GMT
- Title: Identifying and Analyzing Performance-Critical Tokens in Large Language Models
- Authors: Yu Bai, Heyan Huang, Cesare Spinoso-Di Piano, Marc-Antoine Rondeau, Sanxing Chen, Yang Gao, Jackie Chi Kit Cheung,
- Abstract summary: We study how large language models learn to perform tasks from demonstrations.<n>Our work sheds light on how large language models learn to perform tasks from demonstrations and deepens our understanding of the roles different types of tokens play in large language models.
- Score: 52.404072802235234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In-context learning (ICL) has emerged as an effective solution for few-shot learning with large language models (LLMs). However, how LLMs leverage demonstrations to specify a task and learn a corresponding computational function through ICL is underexplored. Drawing from the way humans learn from content-label mappings in demonstrations, we categorize the tokens in an ICL prompt into content, stopword, and template tokens. Our goal is to identify the types of tokens whose representations directly influence LLM's performance, a property we refer to as being performance-critical. By ablating representations from the attention of the test example, we find that the representations of informative content tokens have less influence on performance compared to template and stopword tokens, which contrasts with the human attention to informative words. We give evidence that the representations of performance-critical tokens aggregate information from the content tokens. Moreover, we demonstrate experimentally that lexical meaning, repetition, and structural cues are the main distinguishing characteristics of these tokens. Our work sheds light on how large language models learn to perform tasks from demonstrations and deepens our understanding of the roles different types of tokens play in large language models.
Related papers
- Tokenization is Sensitive to Language Variation [14.568179478275255]
Tokenizers split texts into smaller units and might behave differently for less common linguistic forms.
This might affect downstream LLM performance differently on two types of tasks.
We investigate how key algorithmic design choices impact downstream models' performances.
arXiv Detail & Related papers (2025-02-21T09:58:54Z) - PICLe: Pseudo-Annotations for In-Context Learning in Low-Resource Named Entity Detection [56.916656013563355]
In-context learning (ICL) enables Large Language Models to perform tasks using few demonstrations.
We propose PICLe, a framework for in-context learning with noisy, pseudo-annotated demonstrations.
We evaluate PICLe on five biomedical NED datasets and show that, with zero human annotation, PICLe outperforms ICL in low-resource settings.
arXiv Detail & Related papers (2024-12-16T16:09:35Z) - [CLS] Token Tells Everything Needed for Training-free Efficient MLLMs [66.5266435598799]
Multi-language Large Language Models (MLLMs) have recently demonstrated strong performance across a wide range of vision tasks.
However, their efficient deployment remains a substantial challenge due to high computational costs and memory requirements.
We introduce a simple yet effective method for train-free visual compression, called VTC- compression.
arXiv Detail & Related papers (2024-12-08T05:29:39Z) - Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models [7.115323364355489]
In-context learning, the ability to adapt based on a few examples in the input prompt, is a ubiquitous feature of large language models (LLMs)
We first show that Llama $3$ $70$B can solve simple RL problems in-context.
We then analyze the residual stream of Llama using Sparse Autoencoders (SAEs) and find representations that closely match temporal difference (TD) errors.
arXiv Detail & Related papers (2024-10-02T06:51:12Z) - Exploring Italian sentence embeddings properties through multi-tasking [1.4335183427838039]
We study how sentence representations built using pre-trained language models encode specific syntactic and semantic information.
We use a two-level architecture to model separately a compression of the sentence embeddings into a representation that contains relevant information for a task, and a BLM task.
While we expected that the sentence structure -- in terms of sequence of phrases/chunks -- and chunk properties could be shared across tasks, performance and error analysis show that the clues for the different tasks are encoded in different manners in the sentence embeddings.
arXiv Detail & Related papers (2024-09-10T16:22:18Z) - Towards Semantic Equivalence of Tokenization in Multimodal LLM [149.11720372278273]
Vision tokenization is essential for semantic alignment between vision and language.
This paper proposes a novel dynamic Semantic-Equivalent Vision Tokenizer (SeTok)
SeTok groups visual features into semantic units via a dynamic clustering algorithm.
The resulting vision tokens effectively preserve semantic integrity and capture both low-frequency and high-frequency visual features.
arXiv Detail & Related papers (2024-06-07T17:55:43Z) - Does In-Context Learning Really Learn? Rethinking How Large Language Models Respond and Solve Tasks via In-Context Learning [41.606494950216764]
In-context Learning (ICL) has emerged as a powerful capability alongside the development of scaled-up large language models (LLMs)
This paper decomposes the overall performance of ICL into three dimensions, label space, format, and discrimination.
We show that ICL exhibits significant efficacy in regulating the label space and format, which helps LLMs respond to desired label words.
arXiv Detail & Related papers (2024-04-11T08:20:10Z) - Helping Language Models Learn More: Multi-dimensional Task Prompt for
Few-shot Tuning [36.14688633670085]
We propose MTPrompt, a multi-dimensional task prompt learning method based on task-related object, summary, and task description information.
By automatically building and searching for appropriate prompts, our proposed MTPrompt achieves the best results on few-shot samples setting and five different datasets.
arXiv Detail & Related papers (2023-12-13T10:00:44Z) - Improving Input-label Mapping with Demonstration Replay for In-context
Learning [67.57288926736923]
In-context learning (ICL) is an emerging capability of large autoregressive language models.
We propose a novel ICL method called Sliding Causal Attention (RdSca)
We show that our method significantly improves the input-label mapping in ICL demonstrations.
arXiv Detail & Related papers (2023-10-30T14:29:41Z) - Understanding the Role of Input Token Characters in Language Models: How
Does Information Loss Affect Performance? [45.53600782873268]
We study how information loss in input token characters affects the performance of pre-training language models.
Surprisingly, we find that pre-training even under extreme settings, i.e. using only one character of each token, the performance retention in standard NLU benchmarks and probing tasks is high.
For instance, a model pre-trained only on single first characters from tokens achieves performance retention of approximately $90$% and $77$% of the full-token model in SuperGLUE and GLUE tasks, respectively.
arXiv Detail & Related papers (2023-10-26T09:47:50Z) - What Makes Good In-context Demonstrations for Code Intelligence Tasks
with LLMs? [60.668318972782295]
Large language models have shown the ability of in-context learning (ICL)
ICL employs task instructions and a few examples as demonstrations, and then inputs the demonstrations to the language models for making predictions.
It is important to systematically investigate how to construct a good demonstration for code-related tasks.
arXiv Detail & Related papers (2023-04-15T15:13:58Z) - Revisiting Multimodal Representation in Contrastive Learning: From Patch
and Token Embeddings to Finite Discrete Tokens [76.40196364163663]
We propose a learning-based vision-language pre-training approach, such as CLIP.
We show that our method can learn more comprehensive representations and capture meaningful cross-modal correspondence.
arXiv Detail & Related papers (2023-03-27T00:58:39Z) - Prompting Language Models for Linguistic Structure [73.11488464916668]
We present a structured prompting approach for linguistic structured prediction tasks.
We evaluate this approach on part-of-speech tagging, named entity recognition, and sentence chunking.
We find that while PLMs contain significant prior knowledge of task labels due to task leakage into the pretraining corpus, structured prompting can also retrieve linguistic structure with arbitrary labels.
arXiv Detail & Related papers (2022-11-15T01:13:39Z) - Rethinking the Role of Demonstrations: What Makes In-Context Learning
Work? [112.72413411257662]
Large language models (LMs) are able to in-context learn by conditioning on a few input-label pairs (demonstrations) and making predictions for new inputs.
We show that ground truth demonstrations are in fact not required -- randomly replacing labels in the demonstrations barely hurts performance.
We find that other aspects of the demonstrations are the key drivers of end task performance.
arXiv Detail & Related papers (2022-02-25T17:25:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.