LLMs are Not Just Next Token Predictors
- URL: http://arxiv.org/abs/2408.04666v1
- Date: Tue, 6 Aug 2024 16:36:28 GMT
- Title: LLMs are Not Just Next Token Predictors
- Authors: Stephen M. Downes, Patrick Forber, Alex Grzankowski,
- Abstract summary: LLMs are statistical models of language learning through gradient descent with a next token prediction objective.
While LLMs are engineered using next token prediction, and trained based on their success at this task, our view is that a reduction to just next token predictor sells LLMs short.
In order to draw this out, we will make an analogy with a once prominent research program in biology explaining evolution and development from the gene's eye view.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: LLMs are statistical models of language learning through stochastic gradient descent with a next token prediction objective. Prompting a popular view among AI modelers: LLMs are just next token predictors. While LLMs are engineered using next token prediction, and trained based on their success at this task, our view is that a reduction to just next token predictor sells LLMs short. Moreover, there are important explanations of LLM behavior and capabilities that are lost when we engage in this kind of reduction. In order to draw this out, we will make an analogy with a once prominent research program in biology explaining evolution and development from the gene's eye view.
Related papers
- Predicting Emergent Capabilities by Finetuning [98.9684114851891]
We find that finetuning language models can shift the point in scaling at which emergence occurs towards less capable models.
We validate this approach using four standard NLP benchmarks.
We find that, in some cases, we can accurately predict whether models trained with up to 4x more compute have emerged.
arXiv Detail & Related papers (2024-11-25T01:48:09Z) - FIRP: Faster LLM inference via future intermediate representation prediction [54.897493351694195]
FIRP generates multiple tokens instead of one at each decoding step.
We conduct extensive experiments, showing a speedup ratio of 1.9x-3x in several models and datasets.
arXiv Detail & Related papers (2024-10-27T15:53:49Z) - Decoding with Limited Teacher Supervision Requires Understanding When to Trust the Teacher [11.136112399898481]
How can small-scale large language models (LLMs) efficiently utilize the supervision of LLMs to improve their generative quality?
We develop an algorithm to effectively aggregate the small-scale LLM and LLM predictions on initial tokens.
We demonstrate that our method provides a consistent improvement over conventional decoding strategies.
arXiv Detail & Related papers (2024-06-26T01:16:12Z) - From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems [59.40480894948944]
Large language model (LLM) empowered agents are able to solve decision-making problems in the physical world.
Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting.
We prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning.
arXiv Detail & Related papers (2024-05-30T09:42:54Z) - Are Large Language Models (LLMs) Good Social Predictors? [36.68104332805214]
We show that Large Language Models (LLMs) cannot work as expected on social prediction when given general input features without shortcuts.
We introduce a novel social prediction task, Soc-PRF Prediction, which utilizes general features as input and simulates real-world social study settings.
arXiv Detail & Related papers (2024-02-20T00:59:22Z) - Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks.
LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z) - Learning to Generate Explainable Stock Predictions using Self-Reflective
Large Language Models [54.21695754082441]
We propose a framework to teach Large Language Models (LLMs) to generate explainable stock predictions.
A reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations.
Our framework can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient.
arXiv Detail & Related papers (2024-02-06T03:18:58Z) - Psychometric Predictive Power of Large Language Models [32.31556074470733]
We find that instruction tuning does not always make large language models human-like from a cognitive modeling perspective.
Next-word probabilities estimated by instruction-tuned LLMs are often worse at simulating human reading behavior than those estimated by base LLMs.
arXiv Detail & Related papers (2023-11-13T17:19:14Z) - Evaluating and Explaining Large Language Models for Code Using Syntactic
Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code.
At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes.
We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.