Related papers: UniDiff: A Unified Diffusion Framework for Multimodal Time Series Forecasting

UniDiff: A Unified Diffusion Framework for Multimodal Time Series Forecasting

URL: http://arxiv.org/abs/2512.07184v1
Date: Mon, 08 Dec 2025 05:36:14 GMT
Title: UniDiff: A Unified Diffusion Framework for Multimodal Time Series Forecasting
Authors: Da Zhang, Bingyu Li, Zhuyuan Zhao, Junyu Gao, Feiping Nie, Xuelong Li,
Abstract summary: We propose UniDiff, a unified diffusion framework for multimodal time series forecasting.<n>At its core lies a unified and parallel fusion module, where a single cross-attention mechanism integrates structural information from timestamps and semantic context from texts.<n>Experiments on real-world benchmark datasets across eight domains demonstrate that the proposed UniDiff model achieves state-of-the-art performance.
Score: 90.47915032778366
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As multimodal data proliferates across diverse real-world applications, leveraging heterogeneous information such as texts and timestamps for accurate time series forecasting (TSF) has become a critical challenge. While diffusion models demonstrate exceptional performance in generation tasks, their application to TSF remains largely confined to modeling single-modality numerical sequences, overlooking the abundant cross-modal signals inherent in complex heterogeneous data. To address this gap, we propose UniDiff, a unified diffusion framework for multimodal time series forecasting. To process the numerical sequence, our framework first tokenizes the time series into patches, preserving local temporal dynamics by mapping each patch to an embedding space via a lightweight MLP. At its core lies a unified and parallel fusion module, where a single cross-attention mechanism adaptively weighs and integrates structural information from timestamps and semantic context from texts in one step, enabling a flexible and efficient interplay between modalities. Furthermore, we introduce a novel classifier-free guidance mechanism designed for multi-source conditioning, allowing for decoupled control over the guidance strength of textual and temporal information during inference, which significantly enhances model robustness. Extensive experiments on real-world benchmark datasets across eight domains demonstrate that the proposed UniDiff model achieves state-of-the-art performance.

Related papers

Enhancing few-shot time series forecasting with LLM-guided diffusion [12.286204074670236]
Time series forecasting in specialized domains is often constrained by limited data availability.<n>We propose LTSM-DIFF, a novel learning framework that integrates the expressive power of large language models with the generative capability of diffusion models.<n>Our work establishes a new paradigm for time series analysis under data scarcity.
arXiv Detail & Related papers (2026-01-19T06:30:05Z)
FusAD: Time-Frequency Fusion with Adaptive Denoising for General Time Series Analysis [92.23551599659186]
Time series analysis plays a vital role in fields such as finance, healthcare, industry, and meteorology.<n>FusAD is a unified analysis framework designed for diverse time series tasks.
arXiv Detail & Related papers (2025-12-16T04:34:27Z)
FAIM: Frequency-Aware Interactive Mamba for Time Series Classification [87.84511960413715]
Time series classification (TSC) is crucial in numerous real-world applications, such as environmental monitoring, medical diagnosis, and posture recognition.<n>We propose FAIM, a lightweight Frequency-Aware Interactive Mamba model.<n>We show that FAIM consistently outperforms existing state-of-the-art (SOTA) methods, achieving a superior trade-off between accuracy and efficiency.
arXiv Detail & Related papers (2025-11-26T08:36:33Z)
JointDiff: Bridging Continuous and Discrete in Multi-Agent Trajectory Generation [75.58351043849385]
Generative models often treat continuous data and discrete events as separate processes, creating a gap in modeling complex systems where they unify synchronously.<n>To bridge this gap, we introduce JointDiff, a novel diffusion framework designed to interact these two processes by simultaneously generating continuous-temporal data and synchronous discrete events.<n>JointDiff achieves state-of-the-art performance, demonstrating that joint modeling is crucial for building realistic and controllable models for interactive systems.
arXiv Detail & Related papers (2025-09-26T16:04:00Z)
TIMED: Adversarial and Autoregressive Refinement of Diffusion-Based Time Series Generation [0.31498833540989407]
TIMED is a unified generative framework that captures global structure via a forward-reverse diffusion process.<n>To further align the real and synthetic distributions in feature space, TIMED incorporates a Maximum Mean Discrepancy (MMD) loss.<n>We show that TIMED generates more realistic and temporally coherent sequences than state-of-the-art generative models.
arXiv Detail & Related papers (2025-09-23T23:05:40Z)
UniCast: A Unified Multimodal Prompting Framework for Time Series Forecasting [9.836278124939453]
Time series forecasting is a foundational task across domains, such as finance, healthcare, and environmental monitoring.<n>Existing models operate predominantly in a unimodal setting, ignoring the rich multimodal context, such as visual and textual signals, that often accompanies time series data in real-world scenarios.<n>This paper introduces a novel parameter-efficient multimodal framework, UniCast, that extends TSFMs to jointly leverage time series, vision, text modalities for enhanced forecasting performance.
arXiv Detail & Related papers (2025-08-16T07:33:27Z)
FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation [57.577843653775]
We propose textbfFindRec (textbfFlexible unified textbfinformation textbfdisentanglement for multi-modal sequential textbfRecommendation)<n>A Stein kernel-based Integrated Information Coordination Module (IICM) theoretically guarantees distribution consistency between multimodal features and ID streams.<n>A cross-modal expert routing mechanism that adaptively filters and combines multimodal features based on their contextual relevance.
arXiv Detail & Related papers (2025-07-07T04:09:45Z)
Multimodal Conditioned Diffusive Time Series Forecasting [16.72476672866356]
We propose a multimodal conditioned diffusion model for time series forecasting (TSF)<n>Timestamps and texts are combined to establish temporal and semantic correlations among different data points.<n>Experiments on real-world benchmark datasets demonstrate that the proposed MCD-TSF model achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-04-28T10:56:23Z)
LAST SToP For Modeling Asynchronous Time Series [19.401463051705377]
We present a novel prompt design for Large Language Models (LLMs) tailored to Asynchronous Time Series.<n>Our approach effectively utilizes the rich natural language of event descriptions, allowing LLMs to benefit from their broad world knowledge for reasoning across different domains and tasks.<n>We further introduce Soft Prompting, a novel prompt-tuning mechanism that significantly improves model performance, outperforming existing fine-tuning methods such as QLoRA.
arXiv Detail & Related papers (2025-02-04T01:42:45Z)
TimeDiT: General-purpose Diffusion Transformers for Time Series Foundation Model [11.281386703572842]
TimeDiT is a diffusion transformer model that combines temporal dependency learning with probabilistic sampling.<n>TimeDiT employs a unified masking mechanism to harmonize the training and inference process across diverse tasks.<n>Our systematic evaluation demonstrates TimeDiT's effectiveness both in fundamental tasks, i.e., forecasting and imputation, through zero-shot/fine-tuning.
arXiv Detail & Related papers (2024-09-03T22:31:57Z)
SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion [59.96233305733875]
Time series forecasting plays a crucial role in various fields such as finance, traffic management, energy, and healthcare. Several methods utilize mechanisms like attention or mixer to address this by capturing channel correlations. This paper presents an efficient-based model, the Series-cOre Fused Time Series forecaster (SOFTS)
arXiv Detail & Related papers (2024-04-22T14:06:35Z)
Multi-scale Attention Flow for Probabilistic Time Series Forecasting [68.20798558048678]
We propose a novel non-autoregressive deep learning model, called Multi-scale Attention Normalizing Flow(MANF) Our model avoids the influence of cumulative error and does not increase the time complexity. Our model achieves state-of-the-art performance on many popular multivariate datasets.
arXiv Detail & Related papers (2022-05-16T07:53:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.