Related papers: Next Token Perception Score: Analytical Assessment of your LLM Perception Skills

Next Token Perception Score: Analytical Assessment of your LLM Perception Skills

URL: http://arxiv.org/abs/2505.17169v1
Date: Thu, 22 May 2025 17:18:51 GMT
Title: Next Token Perception Score: Analytical Assessment of your LLM Perception Skills
Authors: Yu-Ang Cheng, Leyang Hu, Hai Huang, Randall Balestriero,
Abstract summary: Next Token Perception Score (NTPS) is a score derived under a linear setting that measures the overlap between autoregressive and perception feature subspaces.<n>We show that NTPS increases following low-rank adaptation (LoRA) fine-tuning, especially in large models.<n>Our results offer both theoretical insights and practical tools for analytically assessing perception skills.
Score: 12.093755170926762
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autoregressive pretraining has become the de facto paradigm for learning general-purpose representations in large language models (LLMs). However, linear probe performance across downstream perception tasks shows substantial variability, suggesting that features optimized for next-token prediction do not consistently transfer well to downstream perception tasks. We demonstrate that representations learned via autoregression capture features that may lie outside the subspaces most informative for perception. To quantify the (mis)alignment between autoregressive pretraining and downstream perception, we introduce the Next Token Perception Score (NTPS)-a score derived under a linear setting that measures the overlap between autoregressive and perception feature subspaces. This metric can be easily computed in closed form from pretrained representations and labeled data, and is proven to both upper- and lower-bound the excess loss. Empirically, we show that NTPS correlates strongly with linear probe accuracy across 12 diverse NLP datasets and eight pretrained models ranging from 270M to 8B parameters, confirming its utility as a measure of alignment. Furthermore, we show that NTPS increases following low-rank adaptation (LoRA) fine-tuning, especially in large models, suggesting that LoRA aligning representations to perception tasks enhances subspace overlap and thus improves downstream performance. More importantly, we find that NTPS reliably predicts the additional accuracy gains attained by LoRA finetuning thereby providing a lightweight prescreening tool for LoRA adaptation. Our results offer both theoretical insights and practical tools for analytically assessing LLM perception skills.

Related papers

Enhancing Chain-of-Thought Reasoning with Critical Representation Fine-tuning [37.16998362490576]
Representation Fine-tuning (ReFT) has attracted widespread attention for significantly improving parameter efficiency by editing representation space alone.<n>We propose Critical Representation Fine-Tuning (CRFT), a novel method that identifies and optimize these critical representations through information flow analysis.<n>Our method is validated across eight benchmarks for arithmetic and commonsense reasoning, using LLaMA and Mistral model families.
arXiv Detail & Related papers (2025-07-14T09:11:33Z)
Enhancing Training Data Attribution with Representational Optimization [57.61977909113113]
Training data attribution methods aim to measure how training data impacts a model's predictions.<n>We propose AirRep, a representation-based approach that closes this gap by learning task-specific and model-aligned representations explicitly for TDA.<n>AirRep introduces two key innovations: a trainable encoder tuned for attribution quality, and an attention-based pooling mechanism that enables accurate estimation of group-wise influence.
arXiv Detail & Related papers (2025-05-24T05:17:53Z)
Scalable Multi-Stage Influence Function for Large Language Models via Eigenvalue-Corrected Kronecker-Factored Parameterization [31.379237532476875]
Pre-trained large language models (LLMs) are commonly fine-tuned to adapt to downstream tasks.<n>In this paper, we propose a multi-stage influence function to attribute predictions of fine-tuned LLMs to pre-training data.
arXiv Detail & Related papers (2025-05-08T07:43:44Z)
Can Pre-training Indicators Reliably Predict Fine-tuning Outcomes of LLMs? [32.04523360747506]
We construct a dataset using 50 1B parameter LLM variants with systematically varied pre-training configurations.<n>We introduce novel unsupervised and supervised proxy metrics derived from pre-training that successfully reduce the relative performance prediction error rate by over 50%.
arXiv Detail & Related papers (2025-04-16T21:19:09Z)
The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models [69.798277882245]
We introduce Unsupervised Prefix Fine-Tuning (UPFT) to enhance large language models' reasoning efficiency.<n>UPFT removes the need for labeled data or exhaustive sampling.<n> Experiments show that UPFT matches the performance of supervised methods.
arXiv Detail & Related papers (2025-03-04T18:56:03Z)
LoRA vs Full Fine-tuning: An Illusion of Equivalence [76.11938177294178]
We study how Low-Rank Adaptation (LoRA) and full-finetuning change pre-trained models.<n>We find that LoRA and full fine-tuning yield weight matrices whose singular value decompositions exhibit very different structure.<n>We extend the finding that LoRA forgets less than full fine-tuning and find its forgetting is vastly localized to the intruder dimension.
arXiv Detail & Related papers (2024-10-28T17:14:01Z)
SPOT: Scalable 3D Pre-training via Occupancy Prediction for Learning Transferable 3D Representations [76.45009891152178]
Pretraining-finetuning approach can alleviate the labeling burden by fine-tuning a pre-trained backbone across various downstream datasets as well as tasks. We show, for the first time, that general representations learning can be achieved through the task of occupancy prediction. Our findings will facilitate the understanding of LiDAR points and pave the way for future advancements in LiDAR pre-training.
arXiv Detail & Related papers (2023-09-19T11:13:01Z)
Uncovering the Hidden Cost of Model Compression [43.62624133952414]
Visual Prompting has emerged as a pivotal method for transfer learning in computer vision. Model compression detrimentally impacts the performance of visual prompting-based transfer. However, negative effects on calibration are not present when models are compressed via quantization.
arXiv Detail & Related papers (2023-08-29T01:47:49Z)
Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks. We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z)
PEP: Parameter Ensembling by Perturbation [13.221295194854642]
Ensembling by Perturbation (PEP) constructs an ensemble of parameter values as random perturbations of the optimal parameter set from training. PEP provides a small improvement in performance, and, in some cases, a substantial improvement in empirical calibration. PEP can be used to probe the level of overfitting that occurred during training.
arXiv Detail & Related papers (2020-10-24T00:16:03Z)
Provably Efficient Causal Reinforcement Learning with Confounded Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting. We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.