Related papers: Benchmark Datasets for Lead-Lag Forecasting on Social Platforms

Benchmark Datasets for Lead-Lag Forecasting on Social Platforms

URL: http://arxiv.org/abs/2511.03877v1
Date: Wed, 05 Nov 2025 21:47:28 GMT
Title: Benchmark Datasets for Lead-Lag Forecasting on Social Platforms
Authors: Kimia Kazemian, Zhenzhen Liu, Yangfanyu Yang, Katie Z Luo, Shuhan Gu, Audrey Du, Xinyu Yang, Jack Jansons, Kilian Q Weinberger, John Thickstun, Yian Yin, Sarah Dean,
Abstract summary: Lead-Lag Forecasting: given an early usage channel (the lead), predict a correlated but temporally shifted outcome channel (the lag)<n>We present two high-volume benchmark datasets-arXiv and GitHub-and outline additional domains with analogous lead-lag dynamics.<n>Our datasets provide ideal testbeds for lead-lag forecasting, by capturing long-horizon dynamics across years, spanning the full spectrum of outcomes.
Score: 30.166429756385767
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Social and collaborative platforms emit multivariate time-series traces in which early interactions-such as views, likes, or downloads-are followed, sometimes months or years later, by higher impact like citations, sales, or reviews. We formalize this setting as Lead-Lag Forecasting (LLF): given an early usage channel (the lead), predict a correlated but temporally shifted outcome channel (the lag). Despite the ubiquity of such patterns, LLF has not been treated as a unified forecasting problem within the time-series community, largely due to the absence of standardized datasets. To anchor research in LLF, here we present two high-volume benchmark datasets-arXiv (accesses -> citations of 2.3M papers) and GitHub (pushes/stars -> forks of 3M repositories)-and outline additional domains with analogous lead-lag dynamics, including Wikipedia (page views -> edits), Spotify (streams -> concert attendance), e-commerce (click-throughs -> purchases), and LinkedIn profile (views -> messages). Our datasets provide ideal testbeds for lead-lag forecasting, by capturing long-horizon dynamics across years, spanning the full spectrum of outcomes, and avoiding survivorship bias in sampling. We documented all technical details of data curation and cleaning, verified the presence of lead-lag dynamics through statistical and classification tests, and benchmarked parametric and non-parametric baselines for regression. Our study establishes LLF as a novel forecasting paradigm and lays an empirical foundation for its systematic exploration in social and usage data. Our data portal with downloads and documentation is available at https://lead-lag-forecasting.github.io/.

Related papers

Not in Sync: Unveiling Temporal Bias in Audio Chat Models [59.146710538620816]
Large Audio Language Models (LALMs) are increasingly applied to audio understanding and multimodal reasoning.<n>We present the first systematic study of temporal bias in LALMs, revealing a key limitation in their timestamp prediction.
arXiv Detail & Related papers (2025-10-14T06:29:40Z)
How Different from the Past? Spatio-Temporal Time Series Forecasting with Self-Supervised Deviation Learning [15.102926671713668]
We propose ST-SSDL, a Spatio-Temporal series time forecasting framework.<n>It discretizes latent space using learnable prototypes that represent typicaltemporal patterns.<n>Experiments show that ST-SSDL consistently outperforms state-of-the-art baselines across multiple metrics.
arXiv Detail & Related papers (2025-10-06T15:21:13Z)
Tackling Data Heterogeneity in Federated Time Series Forecasting [61.021413959988216]
Time series forecasting plays a critical role in various real-world applications, including energy consumption prediction, disease transmission monitoring, and weather forecasting. Most existing methods rely on a centralized training paradigm, where large amounts of data are collected from distributed devices to a central cloud server. We propose a novel framework, Fed-TREND, to address data heterogeneity by generating informative synthetic data as auxiliary knowledge carriers.
arXiv Detail & Related papers (2024-11-24T04:56:45Z)
HoTPP Benchmark: Are We Good at the Long Horizon Events Forecasting? [1.3654846342364308]
We introduce HoTPP, the first benchmark specifically designed to rigorously evaluate long-horizon predictions.<n>We identify shortcomings in widely used evaluation metrics, propose a theoretically grounded T-mAP metric, and offer efficient implementations of popular models.<n>We analyze the impact of autoregression and intensity-based losses on prediction quality, and outline promising directions for future research.
arXiv Detail & Related papers (2024-06-20T14:09:00Z)
FreDF: Learning to Forecast in the Frequency Domain [54.2091536822376]
Time series modeling presents unique challenges due to autocorrelation in both historical data and future sequences.<n>We propose the Frequency-enhanced Direct Forecast (FreDF) which mitigates label autocorrelation by learning to forecast in the frequency domain.
arXiv Detail & Related papers (2024-02-04T08:23:41Z)
Data Contamination Through the Lens of Time [21.933771085956426]
Large language models (LLMs) are often supported by evaluating publicly available benchmarks. This practice raises concerns of data contamination, i.e., evaluating on examples that are explicitly or implicitly included in the training data. We conduct the first thorough longitudinal analysis of data contamination in LLMs by using the natural experiment of training cutoffs in GPT models.
arXiv Detail & Related papers (2023-10-16T17:51:29Z)
Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting [54.04430089029033]
We present Lag-Llama, a general-purpose foundation model for time series forecasting based on a decoder-only transformer architecture. Lag-Llama is pretrained on a large corpus of diverse time series data from several domains, and demonstrates strong zero-shot generalization capabilities. When fine-tuned on relatively small fractions of such previously unseen datasets, Lag-Llama achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-10-12T12:29:32Z)
AnoShift: A Distribution Shift Benchmark for Unsupervised Anomaly Detection [7.829710051617368]
We introduce an unsupervised anomaly detection benchmark with data that shifts over time, built over Kyoto-2006+, a traffic dataset for network intrusion detection. We first highlight the non-stationary nature of the data, using a basic per-feature analysis, t-SNE, and an Optimal Transport approach for measuring the overall distribution distances between years. We validate the performance degradation over time with diverse models, ranging from classical approaches to deep learning.
arXiv Detail & Related papers (2022-06-30T17:59:22Z)
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video. Recent studies have found that current benchmark datasets may have obvious moment annotation biases. We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z)
Are Missing Links Predictable? An Inferential Benchmark for Knowledge Graph Completion [79.07695173192472]
InferWiki improves upon existing benchmarks in inferential ability, assumptions, and patterns. Each testing sample is predictable with supportive data in the training set. In experiments, we curate two settings of InferWiki varying in sizes and structures, and apply the construction process on CoDEx as comparative datasets.
arXiv Detail & Related papers (2021-08-03T09:51:15Z)
A Closer Look at Temporal Sentence Grounding in Videos: Datasets and Metrics [70.45937234489044]
We re- organize two widely-used TSGV datasets (Charades-STA and ActivityNet Captions) to make it different from the training split. We introduce a new evaluation metric "dR@$n$,IoU@$m$" to calibrate the basic IoU scores. All the results demonstrate that the re-organized datasets and new metric can better monitor the progress in TSGV.
arXiv Detail & Related papers (2021-01-22T09:59:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.