Can Language Models Use Forecasting Strategies?
- URL: http://arxiv.org/abs/2406.04446v1
- Date: Thu, 6 Jun 2024 19:01:42 GMT
- Title: Can Language Models Use Forecasting Strategies?
- Authors: Sarah Pratt, Seth Blumberg, Pietro Kreitlon Carolino, Meredith Ringel Morris,
- Abstract summary: We describe experiments using a novel dataset of real world events and associated human predictions.
We find that models still struggle to make accurate predictions about the future.
- Score: 14.332379032371612
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advances in deep learning systems have allowed large models to match or surpass human accuracy on a number of skills such as image classification, basic programming, and standardized test taking. As the performance of the most capable models begin to saturate on tasks where humans already achieve high accuracy, it becomes necessary to benchmark models on increasingly complex abilities. One such task is forecasting the future outcome of events. In this work we describe experiments using a novel dataset of real world events and associated human predictions, an evaluation metric to measure forecasting ability, and the accuracy of a number of different LLM based forecasting designs on the provided dataset. Additionally, we analyze the performance of the LLM forecasters against human predictions and find that models still struggle to make accurate predictions about the future. Our follow-up experiments indicate this is likely due to models' tendency to guess that most events are unlikely to occur (which tends to be true for many prediction datasets, but does not reflect actual forecasting abilities). We reflect on next steps for developing a systematic and reliable approach to studying LLM forecasting.
Related papers
- Predicting Emergent Capabilities by Finetuning [98.9684114851891]
We find that finetuning language models can shift the point in scaling at which emergence occurs towards less capable models.
We validate this approach using four standard NLP benchmarks.
We find that, in some cases, we can accurately predict whether models trained with up to 4x more compute have emerged.
arXiv Detail & Related papers (2024-11-25T01:48:09Z) - Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - LABOR-LLM: Language-Based Occupational Representations with Large Language Models [8.909328013944567]
This paper considers an alternative where the fine-tuning of the CAREER foundation model is replaced by fine-tuning LLMs.
We show that our fine-tuned LLM-based models' predictions are more representative of the career trajectories of various workforce subpopulations than off-the-shelf LLM models and CAREER.
arXiv Detail & Related papers (2024-06-25T23:07:18Z) - Forecasting with Deep Learning: Beyond Average of Average of Average Performance [0.393259574660092]
Current practices for evaluating and comparing forecasting models focus on summarising performance into a single score.
We propose a novel framework for evaluating models from multiple perspectives.
We show the advantages of this framework by comparing a state-of-the-art deep learning approach with classical forecasting techniques.
arXiv Detail & Related papers (2024-06-24T12:28:22Z) - F-FOMAML: GNN-Enhanced Meta-Learning for Peak Period Demand Forecasting with Proxy Data [65.6499834212641]
We formulate the demand prediction as a meta-learning problem and develop the Feature-based First-Order Model-Agnostic Meta-Learning (F-FOMAML) algorithm.
By considering domain similarities through task-specific metadata, our model improved generalization, where the excess risk decreases as the number of training tasks increases.
Compared to existing state-of-the-art models, our method demonstrates a notable improvement in demand prediction accuracy, reducing the Mean Absolute Error by 26.24% on an internal vending machine dataset and by 1.04% on the publicly accessible JD.com dataset.
arXiv Detail & Related papers (2024-06-23T21:28:50Z) - Approaching Human-Level Forecasting with Language Models [34.202996056121]
We study whether language models (LMs) can forecast at the level of competitive human forecasters.
We develop a retrieval-augmented LM system designed to automatically search for relevant information, generate forecasts, and aggregate predictions.
arXiv Detail & Related papers (2024-02-28T18:54:18Z) - On some limitations of data-driven weather forecasting models [0.0]
We examine some aspects of the forecasts produced by an exemplar of the current generation of ML models, Pangu-Weather.
The main conclusion is that Pangu-Weather forecasts, and possibly those of similar ML models, do not have the fidelity and physical consistency of physics-based models.
arXiv Detail & Related papers (2023-09-15T15:21:57Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Learning Accurate Long-term Dynamics for Model-based Reinforcement
Learning [7.194382512848327]
We propose a new parametrization to supervised learning on state-action data to stably predict at longer horizons.
Our results in simulated and experimental robotic tasks show that our trajectory-based models yield significantly more accurate long term predictions.
arXiv Detail & Related papers (2020-12-16T18:47:37Z) - Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual
Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms.
We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance.
We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z) - Ambiguity in Sequential Data: Predicting Uncertain Futures with
Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data.
We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.