How Well Do LLMs Predict Human Behavior? A Measure of their Pretrained Knowledge
- URL: http://arxiv.org/abs/2601.12343v1
- Date: Sun, 18 Jan 2026 10:28:54 GMT
- Title: How Well Do LLMs Predict Human Behavior? A Measure of their Pretrained Knowledge
- Authors: Wayne Gao, Sukjin Han, Annie Liang,
- Abstract summary: Large language models (LLMs) are increasingly used to predict human behavior.<n>We propose a measure for evaluating how much knowledge a pretrained LLM brings to such a prediction.<n>We estimate this measure by comparing the prediction error of a fixed LLM in a given domain to that of flexible machine learning models trained on increasing samples of domain-specific data.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) are increasingly used to predict human behavior. We propose a measure for evaluating how much knowledge a pretrained LLM brings to such a prediction: its equivalent sample size, defined as the amount of task-specific data needed to match the predictive accuracy of the LLM. We estimate this measure by comparing the prediction error of a fixed LLM in a given domain to that of flexible machine learning models trained on increasing samples of domain-specific data. We further provide a statistical inference procedure by developing a new asymptotic theory for cross-validated prediction error. Finally, we apply this method to the Panel Study of Income Dynamics. We find that LLMs encode considerable predictive information for some economic variables but much less for others, suggesting that their value as substitutes for domain-specific data differs markedly across settings.
Related papers
- Quantile Regression with Large Language Models for Price Prediction [15.277244542405345]
Large Language Models (LLMs) have shown promise in structured prediction tasks, including regression.<n>We propose a novel quantile regression approach that enables LLMs to produce full predictive distributions.<n>A Mistral-7B model fine-tuned with quantile heads significantly outperforms traditional approaches for both point and distributional estimations.
arXiv Detail & Related papers (2025-06-07T04:19:28Z) - Applying Large Language Models to Travel Satisfaction Analysis [2.5105418815378555]
This study uses household survey data collected in Shanghai to identify the existence and source of misalignment between Large Language Models (LLMs) and humans.<n>LLMs have strongcapabilities in contextual understanding and generalization, significantly reducing dependence on task-specific data.<n>We propose an LLM-based modeling approach that can be applied to model travel behavior with small sample sizes.
arXiv Detail & Related papers (2025-05-29T09:11:58Z) - When can isotropy help adapt LLMs' next word prediction to numerical domains? [53.98633183204453]
It is shown that the isotropic property of LLM embeddings in contextual embedding space preserves the underlying structure of representations.<n> Experiments show that different characteristics of numerical data and model architectures have different impacts on isotropy.
arXiv Detail & Related papers (2025-05-22T05:10:34Z) - AutoElicit: Using Large Language Models for Expert Prior Elicitation in Predictive Modelling [53.54623137152208]
We introduce AutoElicit to extract knowledge from large language models and construct priors for predictive models.<n>We show these priors are informative and can be refined using natural language.<n>We find that AutoElicit yields priors that can substantially reduce error over uninformative priors, using fewer labels, and consistently outperform in-context learning.
arXiv Detail & Related papers (2024-11-26T10:13:39Z) - What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - Dynamic Uncertainty Ranking: Enhancing Retrieval-Augmented In-Context Learning for Long-Tail Knowledge in LLMs [50.29035873837]
Large language models (LLMs) can learn vast amounts of knowledge from diverse domains during pre-training.<n>Long-tail knowledge from specialized domains is often scarce and underrepresented, rarely appearing in the models' memorization.<n>We propose a reinforcement learning-based dynamic uncertainty ranking method for ICL that accounts for the varying impact of each retrieved sample on LLM predictions.
arXiv Detail & Related papers (2024-10-31T03:42:17Z) - Scaling Laws for Predicting Downstream Performance in LLMs [75.28559015477137]
This work focuses on the pre-training loss as a more computation-efficient metric for performance estimation.<n>We present FLP-M, a fundamental approach for performance prediction that addresses the practical need to integrate datasets from multiple sources during pre-training.
arXiv Detail & Related papers (2024-10-11T04:57:48Z) - Bayesian Statistical Modeling with Predictors from LLMs [5.5711773076846365]
State of the art large language models (LLMs) have shown impressive performance on a variety of benchmark tasks.
This raises questions about the human-likeness of LLM-derived information.
arXiv Detail & Related papers (2024-06-13T11:33:30Z) - Can Language Models Use Forecasting Strategies? [14.332379032371612]
We describe experiments using a novel dataset of real world events and associated human predictions.
We find that models still struggle to make accurate predictions about the future.
arXiv Detail & Related papers (2024-06-06T19:01:42Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.