Related papers: LLMs are Not Just Next Token Predictors

LLMs are Not Just Next Token Predictors

URL: http://arxiv.org/abs/2408.04666v1
Date: Tue, 6 Aug 2024 16:36:28 GMT
Title: LLMs are Not Just Next Token Predictors
Authors: Stephen M. Downes, Patrick Forber, Alex Grzankowski,
Abstract summary: LLMs are statistical models of language learning through gradient descent with a next token prediction objective. While LLMs are engineered using next token prediction, and trained based on their success at this task, our view is that a reduction to just next token predictor sells LLMs short. In order to draw this out, we will make an analogy with a once prominent research program in biology explaining evolution and development from the gene's eye view.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LLMs are statistical models of language learning through stochastic gradient descent with a next token prediction objective. Prompting a popular view among AI modelers: LLMs are just next token predictors. While LLMs are engineered using next token prediction, and trained based on their success at this task, our view is that a reduction to just next token predictor sells LLMs short. Moreover, there are important explanations of LLM behavior and capabilities that are lost when we engage in this kind of reduction. In order to draw this out, we will make an analogy with a once prominent research program in biology explaining evolution and development from the gene's eye view.

Related papers

Oops, Wait: Token-Level Signals as a Lens into LLM Reasoning [61.76889440384448]
discourse-like tokens such as "wait" and "therefore" in large language models (LLMs) have offered a unique window into their reasoning processes.<n>We analyze token-level signals through token probabilities across various models.
arXiv Detail & Related papers (2026-01-24T11:43:09Z)
Training LLMs Beyond Next Token Prediction -- Filling the Mutual Information Gap [6.693221730277371]
This work challenges the conventional approach of training large language models (LLMs) using next-token prediction (NTP)<n>We investigate the impact of the proposed solution in three kinds of tasks for LLMs: arithmetic, multi-label classification of text, and natural-language generation.
arXiv Detail & Related papers (2025-10-31T18:59:29Z)
Demystifying Singular Defects in Large Language Models [61.98878352956125]
In large language models (LLMs), the underlying causes of high-norm tokens remain largely unexplored. We provide both theoretical insights and empirical validation across a range of recent models. We showcase two practical applications of these findings: the improvement of quantization schemes and the design of LLM signatures.
arXiv Detail & Related papers (2025-02-10T20:09:16Z)
Predicting Emergent Capabilities by Finetuning [98.9684114851891]
We find that finetuning language models can shift the point in scaling at which emergence occurs towards less capable models. We validate this approach using four standard NLP benchmarks. We find that, in some cases, we can accurately predict whether models trained with up to 4x more compute have emerged.
arXiv Detail & Related papers (2024-11-25T01:48:09Z)
FIRP: Faster LLM inference via future intermediate representation prediction [54.897493351694195]
FIRP generates multiple tokens instead of one at each decoding step. We conduct extensive experiments, showing a speedup ratio of 1.9x-3x in several models and datasets.
arXiv Detail & Related papers (2024-10-27T15:53:49Z)
Decoding with Limited Teacher Supervision Requires Understanding When to Trust the Teacher [11.136112399898481]
How can small-scale large language models (LLMs) efficiently utilize the supervision of LLMs to improve their generative quality? We develop an algorithm to effectively aggregate the small-scale LLM and LLM predictions on initial tokens. We demonstrate that our method provides a consistent improvement over conventional decoding strategies.
arXiv Detail & Related papers (2024-06-26T01:16:12Z)
From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems [59.40480894948944]
Large language model (LLM) empowered agents are able to solve decision-making problems in the physical world. Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting. We prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning.
arXiv Detail & Related papers (2024-05-30T09:42:54Z)
Temporal Scaling Law for Large Language Models [57.83580734589091]
We propose the novel concept of Temporal Scaling Law, studying how the test loss of an LLM evolves as the training steps scale up.<n>In contrast to modeling the test loss as a whole in a coarse-grained manner, we break it down and dive into the fine-grained test loss of each token position.<n>We derive the much more precise temporal scaling law by studying the temporal patterns of the parameters in the dynamic hyperbolic-law.
arXiv Detail & Related papers (2024-04-27T05:49:11Z)
Are Large Language Models (LLMs) Good Social Predictors? [36.68104332805214]
We show that Large Language Models (LLMs) cannot work as expected on social prediction when given general input features without shortcuts. We introduce a novel social prediction task, Soc-PRF Prediction, which utilizes general features as input and simulates real-world social study settings.
arXiv Detail & Related papers (2024-02-20T00:59:22Z)
Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z)
Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models [54.21695754082441]
We propose a framework to teach Large Language Models (LLMs) to generate explainable stock predictions. A reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations. Our framework can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient.
arXiv Detail & Related papers (2024-02-06T03:18:58Z)
Psychometric Predictive Power of Large Language Models [32.31556074470733]
We find that instruction tuning does not always make large language models human-like from a cognitive modeling perspective. Next-word probabilities estimated by instruction-tuned LLMs are often worse at simulating human reading behavior than those estimated by base LLMs.
arXiv Detail & Related papers (2023-11-13T17:19:14Z)
Evaluating and Explaining Large Language Models for Code Using Syntactic Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code. At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes. We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.