PROPHET: An Inferable Future Forecasting Benchmark with Causal Intervened Likelihood Estimation
- URL: http://arxiv.org/abs/2504.01509v1
- Date: Wed, 02 Apr 2025 08:57:42 GMT
- Title: PROPHET: An Inferable Future Forecasting Benchmark with Causal Intervened Likelihood Estimation
- Authors: Zhengwei Tao, Zhi Jin, Bincheng Li, Xiaoying Bai, Haiyan Zhao, Chengfeng Dou, Xiancai Chen, Jia Li, Linyu Li, Chongyang Tao,
- Abstract summary: Recent advances in large language model (LLM)-based systems have shown remarkable potential in forecasting future events.<n>Several benchmarks have been established to evaluate the forecasting capabilities by formalizing the event prediction as a retrieval-augmented generation (RAG) and reasoning task.<n>We introduce a new benchmark, PROPHET, which comprises inferable forecasting questions paired with relevant news for retrieval.
- Score: 46.3251656496956
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting future events stands as one of the ultimate aspirations of artificial intelligence. Recent advances in large language model (LLM)-based systems have shown remarkable potential in forecasting future events, thereby garnering significant interest in the research community. Currently, several benchmarks have been established to evaluate the forecasting capabilities by formalizing the event prediction as a retrieval-augmented generation (RAG) and reasoning task. In these benchmarks, each prediction question is answered with relevant retrieved news articles. However, because there is no consideration on whether the questions can be supported by valid or sufficient supporting rationales, some of the questions in these benchmarks may be inherently noninferable. To address this issue, we introduce a new benchmark, PROPHET, which comprises inferable forecasting questions paired with relevant news for retrieval. To ensure the inferability of the benchmark, we propose Causal Intervened Likelihood (CIL), a statistical measure that assesses inferability through causal inference. In constructing this benchmark, we first collected recent trend forecasting questions and then filtered the data using CIL, resulting in an inferable benchmark for event prediction. Through extensive experiments, we first demonstrate the validity of CIL and in-depth investigations into event prediction with the aid of CIL. Subsequently, we evaluate several representative prediction systems on PROPHET, drawing valuable insights for future directions.
Related papers
- The Future Outcome Reasoning and Confidence Assessment Benchmark [11.149409619312827]
FOReCAst is a benchmark that evaluates models' ability to make predictions and their confidence in them.<n>It spans diverse forecasting scenarios involving Boolean questions, timeframe prediction, and quantity estimation.<n>It provides a comprehensive evaluation of both prediction accuracy and confidence calibration for real-world applications.
arXiv Detail & Related papers (2025-02-27T01:36:00Z) - Wisdom of the Crowds in Forecasting: Forecast Summarization for Supporting Future Event Prediction [17.021220773165016]
Future Event Prediction (FEP) is an essential activity whose demand and application range across multiple domains.<n>One forecasting way is to gather and aggregate collective opinions on the future to make predictions as cumulative perspectives carry the potential to help estimating the likelihood of upcoming events.<n>In this work, we organize the existing research and frameworks that aim to support future event prediction based on crowd wisdom through aggregating individual forecasts.
arXiv Detail & Related papers (2025-02-12T08:35:10Z) - Navigating Tomorrow: Reliably Assessing Large Language Models Performance on Future Event Prediction [17.021220773165016]
This study evaluates the performance of several large language models (LLMs) in supporting future prediction tasks.<n>We create a dataset1 by finding and categorizing news articles based on entity type and its popularity.
arXiv Detail & Related papers (2025-01-10T12:44:46Z) - Consistency Checks for Language Model Forecasters [54.62507816753479]
We measure the performance of forecasters in terms of the consistency of their predictions on different logically-related questions.<n>We build an automated evaluation system that generates a set of base questions, instantiates consistency checks from these questions, elicits predictions of the forecaster, and measures the consistency of the predictions.
arXiv Detail & Related papers (2024-12-24T16:51:35Z) - HoTPP Benchmark: Are We Good at the Long Horizon Events Forecasting? [1.3654846342364308]
Accurately forecasting multiple future events within a given time horizon is crucial for finance, retail, social networks, and healthcare applications.
We propose a novel evaluation method inspired by object detection techniques from computer vision.
To support further research, we release HoTPP, the first benchmark designed explicitly for evaluating long-horizon MTPP predictions.
arXiv Detail & Related papers (2024-06-20T14:09:00Z) - Enhancing Mean-Reverting Time Series Prediction with Gaussian Processes:
Functional and Augmented Data Structures in Financial Forecasting [0.0]
We explore the application of Gaussian Processes (GPs) for predicting mean-reverting time series with an underlying structure.
GPs offer the potential to forecast not just the average prediction but the entire probability distribution over a future trajectory.
This is particularly beneficial in financial contexts, where accurate predictions alone may not suffice if incorrect volatility assessments lead to capital losses.
arXiv Detail & Related papers (2024-02-23T06:09:45Z) - Performative Time-Series Forecasting [71.18553214204978]
We formalize performative time-series forecasting (PeTS) from a machine-learning perspective.
We propose a novel approach, Feature Performative-Shifting (FPS), which leverages the concept of delayed response to anticipate distribution shifts.
We conduct comprehensive experiments using multiple time-series models on COVID-19 and traffic forecasting tasks.
arXiv Detail & Related papers (2023-10-09T18:34:29Z) - Towards Out-of-Distribution Sequential Event Prediction: A Causal
Treatment [72.50906475214457]
The goal of sequential event prediction is to estimate the next event based on a sequence of historical events.
In practice, the next-event prediction models are trained with sequential data collected at one time.
We propose a framework with hierarchical branching structures for learning context-specific representations.
arXiv Detail & Related papers (2022-10-24T07:54:13Z) - What Should I Know? Using Meta-gradient Descent for Predictive Feature
Discovery in a Single Stream of Experience [63.75363908696257]
computational reinforcement learning seeks to construct an agent's perception of the world through predictions of future sensations.
An open challenge in this line of work is determining from the infinitely many predictions that the agent could possibly make which predictions might best support decision-making.
We introduce a meta-gradient descent process by which an agent learns what predictions to make, 2) the estimates for its chosen predictions, and 3) how to use those estimates to generate policies that maximize future reward.
arXiv Detail & Related papers (2022-06-13T21:31:06Z) - Ambiguity in Sequential Data: Predicting Uncertain Futures with
Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data.
We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.