Scaling Open-Ended Reasoning to Predict the Future
- URL: http://arxiv.org/abs/2512.25070v2
- Date: Mon, 05 Jan 2026 18:45:47 GMT
- Title: Scaling Open-Ended Reasoning to Predict the Future
- Authors: Nikhil Chandak, Shashwat Goel, Ameya Prabhu, Moritz Hardt, Jonas Geiping,
- Abstract summary: We train language models to make predictions on open-ended forecasting questions.<n>To scale up training data, we synthesize novel forecasting questions from global events reported in daily news.<n>We find calibration improvements from forecasting training generalize across popular benchmarks.
- Score: 56.672065928345525
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-stakes decision making involves reasoning under uncertainty about the future. In this work, we train language models to make predictions on open-ended forecasting questions. To scale up training data, we synthesize novel forecasting questions from global events reported in daily news, using a fully automated, careful curation recipe. We train the Qwen3 thinking models on our dataset, OpenForesight. To prevent leakage of future information during training and evaluation, we use an offline news corpus, both for data generation and retrieval in our forecasting system. Guided by a small validation set, we show the benefits of retrieval, and an improved reward function for reinforcement learning (RL). Once we obtain our final forecasting system, we perform held-out testing between May to August 2025. Our specialized model, OpenForecaster 8B, matches much larger proprietary models, with our training improving the accuracy, calibration, and consistency of predictions. We find calibration improvements from forecasting training generalize across popular benchmarks. We open-source all our models, code, and data to make research on language model forecasting broadly accessible.
Related papers
- Future-as-Label: Scalable Supervision from Real-World Outcomes [0.0]
Time creates free supervision: forecasts about real-world events resolve to verifiable outcomes.<n>We extend reinforcement learning with verifiable rewards to real-world prediction over time.<n>We train language models to make probabilistic forecasts from causally masked information.
arXiv Detail & Related papers (2026-01-09T22:15:12Z) - Predicting the Future by Retrieving the Past [23.181638022094038]
We propose Predicting the Future by Retrieving the Past (PFRP), a novel approach that explicitly integrates global historical data to enhance forecasting accuracy.<n>PFRP produces more accurate and interpretable forecasts by adaptively combining global predictions with the outputs of any local prediction model.
arXiv Detail & Related papers (2025-11-08T05:24:45Z) - Outcome-based Reinforcement Learning to Predict the Future [1.4313866885019229]
We show that a compact (14B) reasoning model can be trained to match or surpass the predictive accuracy of frontier models like o1.<n>The model's performance is also practically meaningful: in a Polymarket trading simulation, we estimate that its bets would have yielded a return on investment of over 10%.
arXiv Detail & Related papers (2025-05-23T14:56:07Z) - Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training [68.94373533768501]
We model knowledge retention, the capacity of a pre-trained language model to memorize factual information from its corpus, and introduce a principled method to estimate it prior to training.<n>We propose Size-dependent Mutual Information (SMI), an information-theoretic predictor that integrates knowledge frequency, knowledge specificity, and model size to forecast closed-book question answering (QA) accuracy.
arXiv Detail & Related papers (2025-02-06T13:23:53Z) - What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - A Predictive Approach To Enhance Time-Series Forecasting [6.377828331013327]
We introduce Future-Guided Learning, an approach that enhances time-series event forecasting.<n>Our method involves two models: a detection model that analyzes future data to identify critical events and a forecasting model that predicts these events based on current data.<n>We validate our approach on a variety of tasks, demonstrating a 44.8% increase in AUC-ROC for seizure prediction using EEG data, and a 23.4% reduction in MSE for forecasting in nonlinear dynamical systems.
arXiv Detail & Related papers (2024-10-19T21:22:55Z) - ReAugment: Model Zoo-Guided RL for Few-Shot Time Series Augmentation and Forecasting [74.00765474305288]
We present a pilot study on using reinforcement learning (RL) for time series data augmentation.<n>Our method, ReAugment, tackles three critical questions: which parts of the training set should be augmented, how the augmentation should be performed, and what advantages RL brings to the process.
arXiv Detail & Related papers (2024-09-10T07:34:19Z) - F-FOMAML: GNN-Enhanced Meta-Learning for Peak Period Demand Forecasting with Proxy Data [65.6499834212641]
We formulate the demand prediction as a meta-learning problem and develop the Feature-based First-Order Model-Agnostic Meta-Learning (F-FOMAML) algorithm.
By considering domain similarities through task-specific metadata, our model improved generalization, where the excess risk decreases as the number of training tasks increases.
Compared to existing state-of-the-art models, our method demonstrates a notable improvement in demand prediction accuracy, reducing the Mean Absolute Error by 26.24% on an internal vending machine dataset and by 1.04% on the publicly accessible JD.com dataset.
arXiv Detail & Related papers (2024-06-23T21:28:50Z) - Can Language Models Use Forecasting Strategies? [14.332379032371612]
We describe experiments using a novel dataset of real world events and associated human predictions.
We find that models still struggle to make accurate predictions about the future.
arXiv Detail & Related papers (2024-06-06T19:01:42Z) - Forecasting Workload in Cloud Computing: Towards Uncertainty-Aware
Predictions and Transfer Learning [1.5749416770494704]
We show that modelling the uncertainty of predictions has a positive impact on performance.
We investigate whether our models benefit transfer learning capabilities across different domains.
arXiv Detail & Related papers (2023-02-24T14:51:30Z) - Forecasting Future World Events with Neural Networks [68.43460909545063]
Autocast is a dataset containing thousands of forecasting questions and an accompanying news corpus.
The news corpus is organized by date, allowing us to precisely simulate the conditions under which humans made past forecasts.
We test language models on our forecasting task and find that performance is far below a human expert baseline.
arXiv Detail & Related papers (2022-06-30T17:59:14Z) - Test-time Collective Prediction [73.74982509510961]
Multiple parties in machine learning want to jointly make predictions on future test points.
Agents wish to benefit from the collective expertise of the full set of agents, but may not be willing to release their data or model parameters.
We explore a decentralized mechanism to make collective predictions at test time, leveraging each agent's pre-trained model.
arXiv Detail & Related papers (2021-06-22T18:29:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.