Related papers: TimeOmni-VL: Unified Models for Time Series Understanding and Generation

TimeOmni-VL: Unified Models for Time Series Understanding and Generation

URL: http://arxiv.org/abs/2602.17149v1
Date: Thu, 19 Feb 2026 07:50:11 GMT
Title: TimeOmni-VL: Unified Models for Time Series Understanding and Generation
Authors: Tong Guan, Sheng Pan, Johan Barthelemy, Zhao Li, Yujun Cai, Cesare Alippi, Ming Jin, Shirui Pan,
Abstract summary: Time Omni-VL is a vision-centric framework that unifies time series understanding and generation.<n>Time Omni-VL is the first to leverage time series understanding as an explicit control signal for high-fidelity generation.<n> Experiments confirm that this unified approach significantly improves both semantic understanding and numerical precision.
Score: 66.55423802406078
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent time series modeling faces a sharp divide between numerical generation and semantic understanding, with research showing that generation models often rely on superficial pattern matching, while understanding-oriented models struggle with high-fidelity numerical output. Although unified multimodal models (UMMs) have bridged this gap in vision, their potential for time series remains untapped. We propose TimeOmni-VL, the first vision-centric framework that unifies time series understanding and generation through two key innovations: (1) Fidelity-preserving bidirectional mapping between time series and images (Bi-TSI), which advances Time Series-to-Image (TS2I) and Image-to-Time Series (I2TS) conversions to ensure near-lossless transformations. (2) Understanding-guided generation. We introduce TSUMM-Suite, a novel dataset consists of six understanding tasks rooted in time series analytics that are coupled with two generation tasks. With a calibrated Chain-of-Thought, TimeOmni-VL is the first to leverage time series understanding as an explicit control signal for high-fidelity generation. Experiments confirm that this unified approach significantly improves both semantic understanding and numerical precision, establishing a new frontier for multimodal time series modeling.

Related papers

Temporal-Visual Semantic Alignment: A Unified Architecture for Transferring Spatial Priors from Vision Models to Zero-Shot Temporal Tasks [19.299293037292113]
TimeArtist is a temporal-visual conversion framework that pioneers semantic-level alignment between time series fluctuations and visual concepts.<n>Our work establishes a new paradigm for cross-modal generation, bridging the gap between temporal dynamics and visual semantics.
arXiv Detail & Related papers (2025-11-25T02:35:48Z)
DiM-TS: Bridge the Gap between Selective State Space Models and Time Series for Generative Modeling [11.836475971106125]
Time series data plays a pivotal role in a wide variety of fields but faces challenges related to privacy concerns.<n>We propose Lag Fusion Mamba and Permutation Scanning Mamba, which enhance the model's ability to discern significant patterns during the denoising process.<n>We also introduce Diffusion Mamba for Time Series (DiM-TS), a high-quality time series generation model that better preserves the temporal periodicity and inter-channel correlations.
arXiv Detail & Related papers (2025-11-23T06:48:03Z)
Time Series Representations for Classification Lie Hidden in Pretrained Vision Transformers [49.07665715422702]
We propose Time Vision Transformer (TiViT), a framework that converts time series into images.<n>We show that TiViT achieves state-of-the-art performance on standard time series classification benchmarks.<n>Our findings reveal a new direction for reusing vision representations in a non-visual domain.
arXiv Detail & Related papers (2025-06-10T09:54:51Z)
Harnessing Vision Models for Time Series Analysis: A Survey [85.65718718797643]
This survey discusses the advantages of vision models over LLMs in time series analysis.<n>It provides a comprehensive and in-depth overview of the existing methods, with dual views of detailed taxonomy.<n>We address the challenges in the pre- and post-processing steps involved in this framework.
arXiv Detail & Related papers (2025-02-13T00:42:11Z)
Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting [26.4608782425897]
Time-VLM is a novel framework that bridges temporal, visual, and textual modalities for enhanced forecasting.<n>Our framework comprises three key components: (1) a Retrieval-Augmented Learner, which extracts enriched temporal features through memory bank interactions; (2) a Vision-Augmented Learner, which encodes time series as informative images; and (3) a Text-Augmented Learner, which generates contextual textual descriptions.
arXiv Detail & Related papers (2025-02-06T05:59:45Z)
General Time-series Model for Universal Knowledge Representation of Multivariate Time-Series data [61.163542597764796]
We show that time series with different time granularities (or corresponding frequency resolutions) exhibit distinct joint distributions in the frequency domain.<n>A novel Fourier knowledge attention mechanism is proposed to enable learning time-aware representations from both the temporal and frequency domains.<n>An autoregressive blank infilling pre-training framework is incorporated to time series analysis for the first time, leading to a generative tasks agnostic pre-training strategy.
arXiv Detail & Related papers (2025-02-05T15:20:04Z)
Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts [103.725112190618]
This paper introduces Moirai-MoE, using a single input/output projection layer while delegating the modeling of diverse time series patterns to the sparse mixture of experts. Extensive experiments on 39 datasets demonstrate the superiority of Moirai-MoE over existing foundation models in both in-distribution and zero-shot scenarios.
arXiv Detail & Related papers (2024-10-14T13:01:11Z)
TimeMAE: Self-Supervised Representations of Time Series with Decoupled Masked Autoencoders [55.00904795497786]
We propose TimeMAE, a novel self-supervised paradigm for learning transferrable time series representations based on transformer networks. The TimeMAE learns enriched contextual representations of time series with a bidirectional encoding scheme. To solve the discrepancy issue incurred by newly injected masked embeddings, we design a decoupled autoencoder architecture.
arXiv Detail & Related papers (2023-03-01T08:33:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.