Related papers: Beyond Leakage and Complexity: Towards Realistic and Efficient Information Cascade Prediction

Beyond Leakage and Complexity: Towards Realistic and Efficient Information Cascade Prediction

URL: http://arxiv.org/abs/2510.25348v1
Date: Wed, 29 Oct 2025 10:06:08 GMT
Title: Beyond Leakage and Complexity: Towards Realistic and Efficient Information Cascade Prediction
Authors: Jie Peng, Rui Wang, Qiang Wang, Zhewei Wei, Bin Tong, Guan Wang,
Abstract summary: Information cascade popularity prediction is a key problem in analyzing content diffusion in social networks.<n>We propose a time-ordered splitting strategy that chronologically partitions data into consecutive windows.<n>Second, we introduce Taoke, a large-scale e-commerce cascade dataset featuring rich promoter/product attributes.<n>Third, we develop CasTemp, a lightweight framework that efficiently models cascade dynamics through temporal walks.
Score: 37.50536404287287
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Information cascade popularity prediction is a key problem in analyzing content diffusion in social networks. However, current related works suffer from three critical limitations: (1) temporal leakage in current evaluation--random cascade-based splits allow models to access future information, yielding unrealistic results; (2) feature-poor datasets that lack downstream conversion signals (e.g., likes, comments, or purchases), which limits more practical applications; (3) computational inefficiency of complex graph-based methods that require days of training for marginal gains. We systematically address these challenges from three perspectives: task setup, dataset construction, and model design. First, we propose a time-ordered splitting strategy that chronologically partitions data into consecutive windows, ensuring models are evaluated on genuine forecasting tasks without future information leakage. Second, we introduce Taoke, a large-scale e-commerce cascade dataset featuring rich promoter/product attributes and ground-truth purchase conversions--capturing the complete diffusion lifecycle from promotion to monetization. Third, we develop CasTemp, a lightweight framework that efficiently models cascade dynamics through temporal walks, Jaccard-based neighbor selection for inter-cascade dependencies, and GRU-based encoding with time-aware attention. Under leak-free evaluation, CasTemp achieves state-of-the-art performance across four datasets with orders-of-magnitude speedup. Notably, it excels at predicting second-stage popularity conversions--a practical task critical for real-world applications.

Related papers

A Retrieval Augmented Spatio-Temporal Framework for Traffic Prediction [33.28893562327803]
RAST achieves superior performance while maintaining efficiency in large-scale datasets.<n>Our framework consists of three key designs: 1) Decoupled and Query Retriever to capture decoupled temporal features and construct residual fusion via Retrieval-Augmented Generation (RAG); 2) Universal Backbone Predict Storeor that accommodates pre-trained ST-GNNs or simple predictors; and 3) Universal Backbone Predict Storeor that accommodates pre-trained ST-GNNs or simple predictors.
arXiv Detail & Related papers (2025-08-14T10:11:39Z)
Mitigating Trade-off: Stream and Query-guided Aggregation for Efficient and Effective 3D Occupancy Prediction [12.064509280163502]
3D occupancy prediction has emerged as a key perception task for autonomous driving.<n>Recent studies focus on integrating information obtained from past observations to improve prediction accuracy.<n>We propose StreamOcc, a framework that aggregates past-temporal information in a stream-based manner.<n>Experiments on the Occ3D-nus dataset show that StreamOcc achieves state-of-the-art performance in real-time settings, while reducing memory usage by more than 50% compared to previous methods.
arXiv Detail & Related papers (2025-03-28T02:05:53Z)
SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining [62.433137130087445]
SuperFlow++ is a novel framework that integrates pretraining and downstream tasks using consecutive camera pairs.<n>We show that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions.<n>With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving.
arXiv Detail & Related papers (2025-03-25T17:59:57Z)
On Your Mark, Get Set, Predict! Modeling Continuous-Time Dynamics of Cascades for Information Popularity Prediction [5.464598715181046]
Key to accurately predicting information popularity lies in subtly modeling the underlying temporal information diffusion process. We propose ConCat, modeling the Continuous-time dynamics of Cascades for information popularity prediction. We conduct extensive experiments to evaluate ConCat on three real-world datasets.
arXiv Detail & Related papers (2024-09-25T05:08:44Z)
OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries. OPUS incorporates a suite of non-trivial strategies to enhance model performance. Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z)
TimeGraphs: Graph-based Temporal Reasoning [64.18083371645956]
TimeGraphs is a novel approach that characterizes dynamic interactions as a hierarchical temporal graph. Our approach models the interactions using a compact graph-based representation, enabling adaptive reasoning across diverse time scales. We evaluate TimeGraphs on multiple datasets with complex, dynamic agent interactions, including a football simulator, the Resistance game, and the MOMA human activity dataset.
arXiv Detail & Related papers (2024-01-06T06:26:49Z)
Spatio-Temporal Contrastive Self-Supervised Learning for POI-level Crowd Flow Inference [23.8192952068949]
We present a novel Contrastive Self-learning framework for S-temporal data (CSST) Our approach initiates with the construction of a spatial adjacency graph founded on the Points of Interest (POIs) and their respective distances. We adopt a swapped prediction approach to anticipate the representation of the target subgraph from similar instances. Our experiments, conducted on two real-world datasets, demonstrate that the CSST pre-trained on extensive noisy data consistently outperforms models trained from scratch.
arXiv Detail & Related papers (2023-09-06T02:51:24Z)
FormerTime: Hierarchical Multi-Scale Representations for Multivariate Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task. It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z)
Incorporating Reachability Knowledge into a Multi-Spatial Graph Convolution Based Seq2Seq Model for Traffic Forecasting [12.626657411944949]
Existing works cannot perform well for multi-step traffic prediction that involves long future time period. Our model is evaluated on two real world traffic datasets and better performance than other competitors.
arXiv Detail & Related papers (2021-07-04T03:23:30Z)
Predicting Temporal Sets with Deep Neural Networks [50.53727580527024]
We propose an integrated solution based on the deep neural networks for temporal sets prediction. A unique perspective is to learn element relationship by constructing set-level co-occurrence graph. We design an attention-based module to adaptively learn the temporal dependency of elements and sets.
arXiv Detail & Related papers (2020-06-20T03:29:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.