Assessing Large Language Models in Updating Their Forecasts with New Information
- URL: http://arxiv.org/abs/2509.23936v1
- Date: Sun, 28 Sep 2025 15:16:20 GMT
- Title: Assessing Large Language Models in Updating Their Forecasts with New Information
- Authors: Zhangdie Yuan, Zifeng Ding, Andreas Vlachos,
- Abstract summary: We introduce EVOLVECAST, a framework for evaluating whether large language models appropriately revise their predictions in response to new information.<n>We use human forecasters as a comparative reference to analyze prediction shifts and confidence calibration under updated contexts.<n>Neither verbalized nor logits-based confidence estimates consistently outperform the other, and both remain far from the human reference standard.
- Score: 15.692887789817647
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prior work has largely treated future event prediction as a static task, failing to consider how forecasts and the confidence in them should evolve as new evidence emerges. To address this gap, we introduce EVOLVECAST, a framework for evaluating whether large language models appropriately revise their predictions in response to new information. In particular, EVOLVECAST assesses whether LLMs adjust their forecasts when presented with information released after their training cutoff. We use human forecasters as a comparative reference to analyze prediction shifts and confidence calibration under updated contexts. While LLMs demonstrate some responsiveness to new information, their updates are often inconsistent or overly conservative. We further find that neither verbalized nor logits-based confidence estimates consistently outperform the other, and both remain far from the human reference standard. Across settings, models tend to express conservative bias, underscoring the need for more robust approaches to belief updating.
Related papers
- Position: Beyond Model-Centric Prediction -- Agentic Time Series Forecasting [49.05788441962762]
We argue for agentic time series forecasting (ATSF), which reframes forecasting as an agentic process composed of perception, planning, action, reflection, and memory.<n>We outline three representative implementation paradigms -- workflow-based design, agentic reinforcement learning, and a hybrid agentic workflow paradigm -- and discuss the opportunities and challenges that arise when shifting from model-centric prediction to agentic forecasting.
arXiv Detail & Related papers (2026-02-02T08:01:11Z) - Scaling Open-Ended Reasoning to Predict the Future [56.672065928345525]
We train language models to make predictions on open-ended forecasting questions.<n>To scale up training data, we synthesize novel forecasting questions from global events reported in daily news.<n>We find calibration improvements from forecasting training generalize across popular benchmarks.
arXiv Detail & Related papers (2025-12-31T18:59:51Z) - Beyond Naïve Prompting: Strategies for Improved Zero-shot Context-aided Forecasting with LLMs [57.82819770709032]
Large language models (LLMs) can be effective context-aided forecasters via na"ive direct prompting.<n>ReDP improves interpretability by eliciting explicit reasoning traces, allowing us to assess the model's reasoning over the context.<n>CorDP leverages LLMs solely to refine existing forecasts with context, enhancing their applicability in real-world forecasting pipelines.<n> IC-DP proposes embedding historical examples of context-aided forecasting tasks in the prompt, substantially improving accuracy even for the largest models.
arXiv Detail & Related papers (2025-08-13T16:02:55Z) - Analyzing the Role of Context in Forecasting with Large Language Models [17.021220773165016]
We first introduce a novel dataset of over 600 binary forecasting questions, augmented with related news articles and their concise question-related summaries.<n>We then explore the impact of input prompts with varying level of context on forecasting performance.<n>The results indicate that incorporating news articles significantly improves performance, while using few-shot examples leads to a decline in accuracy.
arXiv Detail & Related papers (2025-01-11T10:11:19Z) - Future-Guided Learning: A Predictive Approach To Enhance Time-Series Forecasting [4.866362841501992]
We introduce Future-Guided Learning, an approach that enhances time-series event forecasting through a dynamic feedback mechanism inspired by predictive coding.<n>Our method involves two models: a detection model that analyzes future data to identify critical events and a forecasting model that predicts these events based on current data.<n>We validate our approach on a variety of tasks, demonstrating a 44.8% increase in AUC-ROC for seizure prediction using EEG data, and a 48.7% reduction in MSE for forecasting in nonlinear dynamical systems.
arXiv Detail & Related papers (2024-10-19T21:22:55Z) - Belief Revision: The Adaptability of Large Language Models Reasoning [63.0281286287648]
We introduce Belief-R, a new dataset designed to test LMs' belief revision ability when presented with new evidence.
Inspired by how humans suppress prior inferences, this task assesses LMs within the newly proposed delta reasoning framework.
We evaluate $sim$30 LMs across diverse prompting strategies and found that LMs generally struggle to appropriately revise their beliefs in response to new information.
arXiv Detail & Related papers (2024-06-28T09:09:36Z) - Can LMs Generalize to Future Data? An Empirical Analysis on Text
Summarization [50.20034493626049]
Recent pre-trained language models (PLMs) achieve promising results in existing abstractive summarization datasets.
Existing summarization benchmarks overlap in time with the standard pre-training corpora and finetuning datasets.
We show that parametric knowledge stored in summarization models significantly affects the faithfulness of the generated summaries on future data.
arXiv Detail & Related papers (2023-05-03T08:08:07Z) - Toward Reliable Human Pose Forecasting with Uncertainty [51.628234388046195]
We develop an open-source library for human pose forecasting, including multiple models, supporting several datasets.
We devise two types of uncertainty in the problem to increase performance and convey better trust.
arXiv Detail & Related papers (2023-04-13T17:56:08Z) - LoMEF: A Framework to Produce Local Explanations for Global Model Time
Series Forecasts [2.3096751699592137]
Global Forecasting Models (GFM) that are trained across a set of multiple time series have shown superior results in many forecasting competitions and real-world applications.
However, GFMs typically lack interpretability, especially towards particular time series.
We propose a novel local model-agnostic interpretability approach to explain the forecasts from GFMs.
arXiv Detail & Related papers (2021-11-13T00:17:52Z) - Backward-Compatible Prediction Updates: A Probabilistic Approach [12.049279991559091]
We formalize the Prediction Update Problem and present an efficient probabilistic approach as answer to the above questions.
In extensive experiments on standard classification benchmark data sets, we show that our method outperforms alternative strategies for backward-compatible prediction updates.
arXiv Detail & Related papers (2021-07-02T13:05:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.