Can Base ChatGPT be Used for Forecasting without Additional Optimization?
- URL: http://arxiv.org/abs/2404.07396v3
- Date: Thu, 4 Jul 2024 23:03:45 GMT
- Title: Can Base ChatGPT be Used for Forecasting without Additional Optimization?
- Authors: Van Pham, Scott Cunningham,
- Abstract summary: This study investigates whether OpenAI's ChatGPT-3.5 and ChatGPT-4 can forecast future events.
We employ two prompting strategies: direct prediction and what we call future narratives.
After analyzing 100 trials, we find that future narrative prompts significantly enhanced ChatGPT-4's forecasting accuracy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study investigates whether OpenAI's ChatGPT-3.5 and ChatGPT-4 can forecast future events. To evaluate the accuracy of the predictions, we take advantage of the fact that the training data at the time of our experiments (mid 2023) stopped at September 2021, and ask about events that happened in 2022. We employed two prompting strategies: direct prediction and what we call future narratives which ask ChatGPT to tell fictional stories set in the future with characters retelling events that happened in the past, but after ChatGPT's training data had been collected. We prompted ChatGPT to engage in storytelling, particularly within economic contexts. After analyzing 100 trials, we find that future narrative prompts significantly enhanced ChatGPT-4's forecasting accuracy. This was especially evident in its predictions of major Academy Award winners as well as economic trends, the latter inferred from scenarios where the model impersonated public figures like the Federal Reserve Chair, Jerome Powell. As a falsification exercise, we repeated our experiments in May 2024 at which time the models included more recent training data. ChatGPT-4's accuracy significantly improved when the training window included the events being prompted for, achieving 100% accuracy in many instances. The poorer accuracy for events outside of the training window suggests that in the 2023 prediction experiments, ChatGPT-4 was forming predictions based solely on its training data. Narrative prompting also consistently outperformed direct prompting. These findings indicate that narrative prompts leverage the models' capacity for hallucinatory narrative construction, facilitating more effective data synthesis and extrapolation than straightforward predictions. Our research reveals new aspects of LLMs' predictive capabilities and suggests potential future applications in analytical contexts.
Related papers
- Performative Prediction on Games and Mechanism Design [69.7933059664256]
We study a collective risk dilemma where agents decide whether to trust predictions based on past accuracy.
As predictions shape collective outcomes, social welfare arises naturally as a metric of concern.
We show how to achieve better trade-offs and use them for mechanism design.
arXiv Detail & Related papers (2024-08-09T16:03:44Z) - Can Language Models Use Forecasting Strategies? [14.332379032371612]
We describe experiments using a novel dataset of real world events and associated human predictions.
We find that models still struggle to make accurate predictions about the future.
arXiv Detail & Related papers (2024-06-06T19:01:42Z) - Adaptive Human Trajectory Prediction via Latent Corridors [49.13061580045407]
We formalize the problem of scene-specific adaptive trajectory prediction.
We propose a new adaptation approach inspired by prompt tuning called latent corridors.
With less than 0.1% additional model parameters, we see up to 23.9% ADE improvement in MOT Synth simulated data and 16.4% ADE in MOT and Wildtrack real pedestrian data.
arXiv Detail & Related papers (2023-12-11T18:59:12Z) - Large Language Model Prediction Capabilities: Evidence from a Real-World
Forecasting Tournament [2.900810893770134]
We enroll OpenAI's state-of-the-art large language model, GPT-4, in a three-month forecasting tournament hosted on the Metaculus platform.
We show that GPT-4's probabilistic forecasts are significantly less accurate than the median human-crowd forecasts.
A potential explanation for this underperformance is that in real-world forecasting tournaments, the true answers are genuinely unknown at the time of prediction.
arXiv Detail & Related papers (2023-10-17T17:58:17Z) - Humans and language models diverge when predicting repeating text [52.03471802608112]
We present a scenario in which the performance of humans and LMs diverges.
Human and GPT-2 LM predictions are strongly aligned in the first presentation of a text span, but their performance quickly diverges when memory begins to play a role.
We hope that this scenario will spur future work in bringing LMs closer to human behavior.
arXiv Detail & Related papers (2023-10-10T08:24:28Z) - Performative Time-Series Forecasting [71.18553214204978]
We formalize performative time-series forecasting (PeTS) from a machine-learning perspective.
We propose a novel approach, Feature Performative-Shifting (FPS), which leverages the concept of delayed response to anticipate distribution shifts.
We conduct comprehensive experiments using multiple time-series models on COVID-19 and traffic forecasting tasks.
arXiv Detail & Related papers (2023-10-09T18:34:29Z) - Learn over Past, Evolve for Future: Forecasting Temporal Trends for Fake
News Detection [14.271593690271136]
We design an effective framework FTT (Forecasting Temporal Trends), which could forecast the temporal distribution patterns of news data.
Experiments on the real-world temporally split dataset demonstrate the superiority of our proposed framework.
arXiv Detail & Related papers (2023-06-26T14:29:05Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Forecasting Future World Events with Neural Networks [68.43460909545063]
Autocast is a dataset containing thousands of forecasting questions and an accompanying news corpus.
The news corpus is organized by date, allowing us to precisely simulate the conditions under which humans made past forecasts.
We test language models on our forecasting task and find that performance is far below a human expert baseline.
arXiv Detail & Related papers (2022-06-30T17:59:14Z) - Measuring Forecasting Skill from Text [15.795144936579627]
We explore connections between the language people use to describe their predictions and their forecasting skill.
We present a number of linguistic metrics which are computed over text associated with people's predictions about the future.
We demonstrate that it is possible to accurately predict forecasting skill using a model that is based solely on language.
arXiv Detail & Related papers (2020-06-12T19:04:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.