Related papers: Multi-modal Time Series Analysis: A Tutorial and Survey

Multi-modal Time Series Analysis: A Tutorial and Survey

URL: http://arxiv.org/abs/2503.13709v1
Date: Mon, 17 Mar 2025 20:30:02 GMT
Title: Multi-modal Time Series Analysis: A Tutorial and Survey
Authors: Yushan Jiang, Kanghui Ning, Zijie Pan, Xuyang Shen, Jingchao Ni, Wenchao Yu, Anderson Schneider, Haifeng Chen, Yuriy Nevmyvaka, Dongjin Song,
Abstract summary: Multi-modal time series analysis has emerged as a prominent research area in data mining.<n>However, effective analysis of multi-modal time series is hindered by data heterogeneity, modality gap, misalignment, and inherent noise.<n>Recent advancements in multi-modal time series methods have exploited the multi-modal context via cross-modal interactions.
Score: 36.93906365779472
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-modal time series analysis has recently emerged as a prominent research area in data mining, driven by the increasing availability of diverse data modalities, such as text, images, and structured tabular data from real-world sources. However, effective analysis of multi-modal time series is hindered by data heterogeneity, modality gap, misalignment, and inherent noise. Recent advancements in multi-modal time series methods have exploited the multi-modal context via cross-modal interactions based on deep learning methods, significantly enhancing various downstream tasks. In this tutorial and survey, we present a systematic and up-to-date overview of multi-modal time series datasets and methods. We first state the existing challenges of multi-modal time series analysis and our motivations, with a brief introduction of preliminaries. Then, we summarize the general pipeline and categorize existing methods through a unified cross-modal interaction framework encompassing fusion, alignment, and transference at different levels (\textit{i.e.}, input, intermediate, output), where key concepts and ideas are highlighted. We also discuss the real-world applications of multi-modal analysis for both standard and spatial time series, tailored to general and specific domains. Finally, we discuss future research directions to help practitioners explore and exploit multi-modal time series. The up-to-date resources are provided in the GitHub repository: https://github.com/UConn-DSIS/Multi-modal-Time-Series-Analysis

Related papers

LLMs Meet Cross-Modal Time Series Analytics: Overview and Directions [25.234786025837423]
Large Language Models (LLMs) have emerged as a promising paradigm for time series analytics.<n>This tutorial aims to expand the practical application of LLMs in solving real-world problems in cross-modal time series analytics.
arXiv Detail & Related papers (2025-07-13T23:47:32Z)
Does Multimodality Lead to Better Time Series Forecasting? [84.74978289870155]
It remains unclear whether and under what conditions such multimodal integration consistently yields gains.<n>We evaluate two popular multimodal forecasting paradigms: aligning-based methods, which align time series and text representations; and prompting-based methods, which directly prompt large language models for forecasting.<n>Our findings highlight that on the modeling side, incorporating text information is most helpful given (1) high-capacity text models, (2) comparatively weaker time series models, and (3) appropriate aligning strategies.
arXiv Detail & Related papers (2025-06-20T23:55:56Z)
Towards Cross-Modality Modeling for Time Series Analytics: A Survey in the LLM Era [24.980206999214552]
Large Language Models (LLMs) have emerged as a new paradigm for time series analytics.<n>LLMs are pre-trained on textual corpora and are not inherently optimized for time series.<n>This survey is designed for a range of professionals, researchers, and practitioners interested in LLM-based time series modeling.
arXiv Detail & Related papers (2025-05-05T11:35:33Z)
MTBench: A Multimodal Time Series Benchmark for Temporal Reasoning and Question Answering [21.064096256892686]
Multimodal time-series datasets fall short in evaluating cross-modal reasoning and complex question answering. We introduce Multimodal Time Series Benchmark (MTBench), a benchmark to evaluate large language models (LLMs) on time series and text understanding. We evaluate state-of-the-art LLMs on MTbench, analyzing their effectiveness in modeling the complex relationships between news narratives and temporal patterns.
arXiv Detail & Related papers (2025-03-21T05:04:53Z)
TempoGPT: Enhancing Time Series Reasoning via Quantizing Embedding [13.996105878417204]
We propose a multi-modal time series data construction approach and a multi-modal time series language model (TLM), TempoGPT.<n>We construct multi-modal data for complex reasoning tasks by analyzing the variable-system relationships within a white-box system.<n>Extensive experiments demonstrate that TempoGPT accurately perceives temporal information, logically infers conclusions, and achieves state-of-the-art in the constructed complex time series reasoning tasks.
arXiv Detail & Related papers (2025-01-13T13:47:05Z)
A Practitioner's Guide to Continual Multimodal Pretraining [83.63894495064855]
Multimodal foundation models serve numerous applications at the intersection of vision and language. To keep models updated, research into continual pretraining mainly explores scenarios with either infrequent, indiscriminate updates on large-scale new data, or frequent, sample-level updates. We introduce FoMo-in-Flux, a continual multimodal pretraining benchmark with realistic compute constraints and practical deployment requirements.
arXiv Detail & Related papers (2024-08-26T17:59:01Z)
Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications [90.6849884683226]
We study the challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data. Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds. We show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks.
arXiv Detail & Related papers (2023-06-07T15:44:53Z)
Continual Multimodal Knowledge Graph Construction [62.77243705682985]
Current Multimodal Knowledge Graph Construction (MKGC) models struggle with the real-world dynamism of continuously emerging entities and relations. This study introduces benchmarks aimed at fostering the development of the continual MKGC domain. We introduce MSPT framework, designed to surmount the shortcomings of existing MKGC approaches during multimedia data processing.
arXiv Detail & Related papers (2023-05-15T14:58:28Z)
Few-shot Multimodal Sentiment Analysis based on Multimodal Probabilistic Fusion Prompts [30.15646658460899]
Multimodal sentiment analysis has gained significant attention due to the proliferation of multimodal content on social media. Existing studies in this area rely heavily on large-scale supervised data, which is time-consuming and labor-intensive to collect. We propose a novel method called Multimodal Probabilistic Fusion Prompts (MultiPoint) that leverages diverse cues from different modalities for multimodal sentiment detection in the few-shot scenario.
arXiv Detail & Related papers (2022-11-12T08:10:35Z)
Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments [18.14974353615421]
We propose a novel method for multimodal representation learning in a noisy environment via the generalized product of experts technique. In the proposed method, we train a separate network for each modality to assess the credibility of information coming from that modality. We attain state-of-the-art performance on two challenging benchmarks: multimodal 3D hand-pose estimation and multimodal surgical video segmentation.
arXiv Detail & Related papers (2022-11-07T14:27:38Z)
High-Modality Multimodal Transformer: Quantifying Modality & Interaction Heterogeneity for High-Modality Representation Learning [112.51498431119616]
This paper studies efficient representation learning for high-modality scenarios involving a large set of diverse modalities. A single model, HighMMT, scales up to 10 modalities (text, image, audio, video, sensors, proprioception, speech, time-series, sets, and tables) and 15 tasks from 5 research areas.
arXiv Detail & Related papers (2022-03-02T18:56:20Z)
Channel Exchanging Networks for Multimodal and Multitask Dense Image Prediction [125.18248926508045]
We propose Channel-Exchanging-Network (CEN) which is self-adaptive, parameter-free, and more importantly, applicable for both multimodal fusion and multitask learning. CEN dynamically exchanges channels betweenworks of different modalities. For the application of dense image prediction, the validity of CEN is tested by four different scenarios.
arXiv Detail & Related papers (2021-12-04T05:47:54Z)
StreaMulT: Streaming Multimodal Transformer for Heterogeneous and Arbitrary Long Sequential Data [0.0]
StreaMulT is a Streaming Multimodal Transformer relying on cross-modal attention and on a memory bank to process arbitrarily long input sequences at training time and run in a streaming way at inference. StreaMulT improves the state-of-the-art metrics on CMU-MOSEI dataset for Multimodal Sentiment Analysis task, while being able to deal with much longer inputs than other multimodal models.
arXiv Detail & Related papers (2021-10-15T11:32:17Z)
Time Series Analysis via Network Science: Concepts and Algorithms [62.997667081978825]
This review provides a comprehensive overview of existing mapping methods for transforming time series into networks. We describe the main conceptual approaches, provide authoritative references and give insight into their advantages and limitations in a unified notation and language. Although still very recent, this research area has much potential and with this survey we intend to pave the way for future research on the topic.
arXiv Detail & Related papers (2021-10-11T13:33:18Z)
Multimodal Categorization of Crisis Events in Social Media [81.07061295887172]
We present a new multimodal fusion method that leverages both images and texts as input. In particular, we introduce a cross-attention module that can filter uninformative and misleading components from weak modalities. We show that our method outperforms the unimodal approaches and strong multimodal baselines by a large margin on three crisis-related tasks.
arXiv Detail & Related papers (2020-04-10T06:31:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.