Related papers: Approaching Human-Level Forecasting with Language Models

Approaching Human-Level Forecasting with Language Models

URL: http://arxiv.org/abs/2402.18563v1
Date: Wed, 28 Feb 2024 18:54:18 GMT
Title: Approaching Human-Level Forecasting with Language Models
Authors: Danny Halawi, Fred Zhang, Chen Yueh-Han, Jacob Steinhardt
Abstract summary: We study whether language models (LMs) can forecast at the level of competitive human forecasters. We develop a retrieval-augmented LM system designed to automatically search for relevant information, generate forecasts, and aggregate predictions.
Score: 34.202996056121
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Forecasting future events is important for policy and decision making. In this work, we study whether language models (LMs) can forecast at the level of competitive human forecasters. Towards this goal, we develop a retrieval-augmented LM system designed to automatically search for relevant information, generate forecasts, and aggregate predictions. To facilitate our study, we collect a large dataset of questions from competitive forecasting platforms. Under a test set published after the knowledge cut-offs of our LMs, we evaluate the end-to-end performance of our system against the aggregates of human forecasts. On average, the system nears the crowd aggregate of competitive forecasters, and in some settings surpasses it. Our work suggests that using LMs to forecast the future could provide accurate predictions at scale and help to inform institutional decision making.

Related papers

Wisdom of the Crowds in Forecasting: Forecast Summarization for Supporting Future Event Prediction [17.021220773165016]
Future Event Prediction (FEP) is an essential activity whose demand and application range across multiple domains. One forecasting way is to gather and aggregate collective opinions on the future to make predictions as cumulative perspectives carry the potential to help estimating the likelihood of upcoming events. In this work, we organize the existing research and frameworks that aim to support future event prediction based on crowd wisdom through aggregating individual forecasts.
arXiv Detail & Related papers (2025-02-12T08:35:10Z)
Navigating Tomorrow: Reliably Assessing Large Language Models Performance on Future Event Prediction [17.021220773165016]
This study evaluates the performance of several large language models (LLMs) in supporting future prediction tasks. We create a dataset1 by finding and categorizing news articles based on entity type and its popularity.
arXiv Detail & Related papers (2025-01-10T12:44:46Z)
Consistency Checks for Language Model Forecasters [54.62507816753479]
We measure the performance of forecasters in terms of the consistency of their predictions on different logically-related questions. We build an automated evaluation system that generates a set of base questions, instantiates consistency checks from these questions, elicits predictions of the forecaster, and measures the consistency of the predictions.
arXiv Detail & Related papers (2024-12-24T16:51:35Z)
Hybrid Forecasting of Geopolitical Events [71.73737011120103]
SAGE is a hybrid forecasting system that combines human and machine generated forecasts. The system aggregates human and machine forecasts weighting both for propinquity and based on assessed skill. We show that skilled forecasters who had access to machine-generated forecasts outperformed those who only viewed historical data.
arXiv Detail & Related papers (2024-12-14T22:09:45Z)
Performative Prediction on Games and Mechanism Design [69.7933059664256]
We study a collective risk dilemma where agents decide whether to trust predictions based on past accuracy. As predictions shape collective outcomes, social welfare arises naturally as a metric of concern. We show how to achieve better trade-offs and use them for mechanism design.
arXiv Detail & Related papers (2024-08-09T16:03:44Z)
Deep learning for precipitation nowcasting: A survey from the perspective of time series forecasting [4.5424061912112474]
This paper reviews recent progress in time series precipitation forecasting models using deep learning. We categorize forecasting models into textitrecursive and textitmultiple strategies based on their approaches to predict future frames. We evaluate current deep learning-based models for precipitation forecasting on a public benchmark, discuss their limitations and challenges, and present some promising research directions.
arXiv Detail & Related papers (2024-06-07T12:07:09Z)
Can Language Models Use Forecasting Strategies? [14.332379032371612]
We describe experiments using a novel dataset of real world events and associated human predictions. We find that models still struggle to make accurate predictions about the future.
arXiv Detail & Related papers (2024-06-06T19:01:42Z)
Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Rival Human Crowd Accuracy [1.999925939110439]
We use an ensemble approach consisting of a crowd of twelve large language models (LLMs) We compare the aggregated LLM predictions on 31 binary questions to that of a crowd of human forecasters from a three-month forecasting tournament. We find that both models' forecasting accuracy benefits from exposure to the median human prediction as information.
arXiv Detail & Related papers (2024-02-29T17:27:59Z)
Prediction-Oriented Bayesian Active Learning [51.426960808684655]
Expected predictive information gain (EPIG) is an acquisition function that measures information gain in the space of predictions rather than parameters. EPIG leads to stronger predictive performance compared with BALD across a range of datasets and models.
arXiv Detail & Related papers (2023-04-17T10:59:57Z)
Forecasting Future World Events with Neural Networks [68.43460909545063]
Autocast is a dataset containing thousands of forecasting questions and an accompanying news corpus. The news corpus is organized by date, allowing us to precisely simulate the conditions under which humans made past forecasts. We test language models on our forecasting task and find that performance is far below a human expert baseline.
arXiv Detail & Related papers (2022-06-30T17:59:14Z)
What Should I Know? Using Meta-gradient Descent for Predictive Feature Discovery in a Single Stream of Experience [63.75363908696257]
computational reinforcement learning seeks to construct an agent's perception of the world through predictions of future sensations. An open challenge in this line of work is determining from the infinitely many predictions that the agent could possibly make which predictions might best support decision-making. We introduce a meta-gradient descent process by which an agent learns what predictions to make, 2) the estimates for its chosen predictions, and 3) how to use those estimates to generate policies that maximize future reward.
arXiv Detail & Related papers (2022-06-13T21:31:06Z)
Test-time Collective Prediction [73.74982509510961]
Multiple parties in machine learning want to jointly make predictions on future test points. Agents wish to benefit from the collective expertise of the full set of agents, but may not be willing to release their data or model parameters. We explore a decentralized mechanism to make collective predictions at test time, leveraging each agent's pre-trained model.
arXiv Detail & Related papers (2021-06-22T18:29:58Z)
When Does Uncertainty Matter?: Understanding the Impact of Predictive Uncertainty in ML Assisted Decision Making [68.19284302320146]
We carry out user studies to assess how people with differing levels of expertise respond to different types of predictive uncertainty. We found that showing posterior predictive distributions led to smaller disagreements with the ML model's predictions. This suggests that posterior predictive distributions can potentially serve as useful decision aids which should be used with caution and take into account the type of distribution and the expertise of the human.
arXiv Detail & Related papers (2020-11-12T02:23:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.