Related papers: Temporal Generalization: A Reality Check

Temporal Generalization: A Reality Check

URL: http://arxiv.org/abs/2509.23487v1
Date: Sat, 27 Sep 2025 20:20:44 GMT
Title: Temporal Generalization: A Reality Check
Authors: Divyam Madaan, Sumit Chopra, Kyunghyun Cho,
Abstract summary: We investigate whether and under what conditions models can achieve such a generalization when relying solely on past data.<n>We benchmark several methods within these categories on a diverse set of temporal tasks, including language modeling, news summarization, news tag prediction, academic paper categorization, satellite image-based land use classification over time.<n>Our empirical findings show that none of the evaluated methods consistently outperforms the simple baseline of using the latest available model parameters in all scenarios.
Score: 43.81891375838308
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Machine learning (ML) models often struggle to maintain performance under distribution shifts, leading to inaccurate predictions on unseen future data. In this work, we investigate whether and under what conditions models can achieve such a generalization when relying solely on past data. We explore two primary approaches: convex combinations of past model parameters (\emph{parameter interpolation}) and explicit extrapolation beyond the convex hull of past parameters (\emph{parameter extrapolation}). We benchmark several methods within these categories on a diverse set of temporal tasks, including language modeling, news summarization, news tag prediction, academic paper categorization, satellite image-based land use classification over time, and historical yearbook photo gender prediction. Our empirical findings show that none of the evaluated methods consistently outperforms the simple baseline of using the latest available model parameters in all scenarios. In the absence of access to future data or robust assumptions about the underlying data-generating process, these results underscore the inherent difficulties of generalizing and extrapolating to future data and warrant caution when evaluating claims of such generalization.

Related papers

Prequential posteriors [2.831395148295604]
We introduce prequential posteriors, based upon a predictive-sequential (prequential) loss function.<n>We prove that, under mild conditions, both the prequential loss minimizer and the prequential posterior concentrate around parameters with optimal predictive performance.<n>We validate our method on both a synthetic multi-dimensional time series and a real-world meteorological dataset.
arXiv Detail & Related papers (2025-11-21T19:18:19Z)
Deep Non-Parametric Time Series Forecaster [19.800783133682955]
The proposed approach does not assume any parametric form for the predictive distribution and instead generates predictions by sampling from the empirical distribution according to a tunable strategy. We develop a global version of the proposed method that automatically learns the sampling strategy by exploiting the information across multiple related time series.
arXiv Detail & Related papers (2023-12-22T12:46:30Z)
Can LMs Generalize to Future Data? An Empirical Analysis on Text Summarization [50.20034493626049]
Recent pre-trained language models (PLMs) achieve promising results in existing abstractive summarization datasets. Existing summarization benchmarks overlap in time with the standard pre-training corpora and finetuning datasets. We show that parametric knowledge stored in summarization models significantly affects the faithfulness of the generated summaries on future data.
arXiv Detail & Related papers (2023-05-03T08:08:07Z)
Mixed moving average field guided learning for spatio-temporal data [0.0]
We define a novel Bayesian-temporal embedding and a theory-guided machine learning approach to make ensemble forecasts. We use Lipschitz predictors to determine fixed-time and any-time PAC in the batch learning setting. We then test the performance of our learning methodology by using linear predictors and data sets simulated from a dependence- Ornstein-Uhlenbeck process.
arXiv Detail & Related papers (2023-01-02T16:11:05Z)
A Statistical Model for Predicting Generalization in Few-Shot Classification [6.158812834002346]
We introduce a Gaussian model of the feature distribution to predict the generalization error. We show that our approach outperforms alternatives such as the leave-one-out cross-validation strategy.
arXiv Detail & Related papers (2022-12-13T10:21:15Z)
MRCLens: an MRC Dataset Bias Detection Toolkit [82.44296974850639]
We introduce MRCLens, a toolkit that detects whether biases exist before users train the full model. For the convenience of introducing the toolkit, we also provide a categorization of common biases in MRC.
arXiv Detail & Related papers (2022-07-18T21:05:39Z)
Identifying the Context Shift between Test Benchmarks and Production Data [1.2259552039796024]
There exists a performance gap between machine learning models' accuracy on dataset benchmarks and real-world production data. We outline two methods for identifying changes in context that lead to distribution shifts and model prediction errors. We present two case-studies to highlight the implicit assumptions underlying applied machine learning models that tend to lead to errors.
arXiv Detail & Related papers (2022-07-03T14:54:54Z)
Studying Generalization Through Data Averaging [0.0]
We study train and test performance, as well as the generalization gap given by the mean of their difference over different data set samples. We predict some aspects about how the generalization gap and model train and test performance vary as a function of SGD noise.
arXiv Detail & Related papers (2022-06-28T00:03:40Z)
Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation. We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation. Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z)
Performance metrics for intervention-triggering prediction models do not reflect an expected reduction in outcomes from using the model [71.9860741092209]
Clinical researchers often select among and evaluate risk prediction models. Standard metrics calculated from retrospective data are only related to model utility under certain assumptions. When predictions are delivered repeatedly throughout time, the relationship between standard metrics and utility is further complicated.
arXiv Detail & Related papers (2020-06-02T16:26:49Z)
Ambiguity in Sequential Data: Predicting Uncertain Futures with Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data. We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.