Time-MMD: Multi-Domain Multimodal Dataset for Time Series Analysis
- URL: http://arxiv.org/abs/2406.08627v3
- Date: Sat, 09 Nov 2024 01:25:00 GMT
- Title: Time-MMD: Multi-Domain Multimodal Dataset for Time Series Analysis
- Authors: Haoxin Liu, Shangqing Xu, Zhiyuan Zhao, Lingkai Kong, Harshavardhan Kamarthi, Aditya B. Sasanur, Megha Sharma, Jiaming Cui, Qingsong Wen, Chao Zhang, B. Aditya Prakash,
- Abstract summary: Time-MMD is the first multi-domain, multimodal time series dataset.
MM-TSFlib is the first multimodal time-series forecasting library.
- Score: 40.44013652777716
- License:
- Abstract: Time series data are ubiquitous across a wide range of real-world domains. While real-world time series analysis (TSA) requires human experts to integrate numerical series data with multimodal domain-specific knowledge, most existing TSA models rely solely on numerical data, overlooking the significance of information beyond numerical series. This oversight is due to the untapped potential of textual series data and the absence of a comprehensive, high-quality multimodal dataset. To overcome this obstacle, we introduce Time-MMD, the first multi-domain, multimodal time series dataset covering 9 primary data domains. Time-MMD ensures fine-grained modality alignment, eliminates data contamination, and provides high usability. Additionally, we develop MM-TSFlib, the first multimodal time-series forecasting (TSF) library, seamlessly pipelining multimodal TSF evaluations based on Time-MMD for in-depth analyses. Extensive experiments conducted on Time-MMD through MM-TSFlib demonstrate significant performance enhancements by extending unimodal TSF to multimodality, evidenced by over 15% mean squared error reduction in general, and up to 40% in domains with rich textual data. More importantly, our datasets and library revolutionize broader applications, impacts, research topics to advance TSA. The dataset and library are available at https://github.com/AdityaLab/Time-MMD and https://github.com/AdityaLab/MM-TSFlib.
Related papers
- Tackling Data Heterogeneity in Federated Time Series Forecasting [61.021413959988216]
Time series forecasting plays a critical role in various real-world applications, including energy consumption prediction, disease transmission monitoring, and weather forecasting.
Most existing methods rely on a centralized training paradigm, where large amounts of data are collected from distributed devices to a central cloud server.
We propose a novel framework, Fed-TREND, to address data heterogeneity by generating informative synthetic data as auxiliary knowledge carriers.
arXiv Detail & Related papers (2024-11-24T04:56:45Z) - Multi-Modal Forecaster: Jointly Predicting Time Series and Textual Data [23.10730301634422]
Current forecasting approaches are largely unimodal and ignore the rich textual data that often accompany the time series.
We develop TimeText Corpus (TTC), a carefully curated, time-aligned text and time dataset for multimodal forecasting.
Our dataset is composed of sequences of numbers and text aligned to timestamps, and includes data from two different domains: climate science and healthcare.
arXiv Detail & Related papers (2024-11-11T06:04:15Z) - See it, Think it, Sorted: Large Multimodal Models are Few-shot Time Series Anomaly Analyzers [23.701716999879636]
Time series anomaly detection (TSAD) is becoming increasingly vital due to the rapid growth of time series data.
We introduce a pioneering framework called the Time Series Anomaly Multimodal Analyzer (TAMA) to enhance both the detection and interpretation of anomalies.
arXiv Detail & Related papers (2024-11-04T10:28:41Z) - MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens [113.9621845919304]
We release MINT-1T, the most extensive and diverse open-source Multimodal INTerleaved dataset to date.
MINT-1T comprises one trillion text tokens and 3.4 billion images, a 10x scale-up from existing open-source datasets.
Our experiments show that LMMs trained on MINT-1T rival the performance of models trained on the previous leading dataset, OBELICS.
arXiv Detail & Related papers (2024-06-17T07:21:36Z) - PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection [51.20479454379662]
We propose a.
Federated Anomaly Detection framework named PeFAD with the increasing privacy concerns.
We conduct extensive evaluations on four real datasets, where PeFAD outperforms existing state-of-the-art baselines by up to 28.74%.
arXiv Detail & Related papers (2024-06-04T13:51:08Z) - Advancing multivariate time series similarity assessment: an integrated computational approach [0.0]
We propose an integrated computational approach for assessing the similarity of multivariate time series data.
MTASA is built upon a hybrid methodology designed to optimize time series alignment, complemented by a multiprocessing engine.
Results from this study highlight MTASA's superiority, achieving approximately 1.5 times greater accuracy and twice the speed compared to existing state-of-the-art integrated frameworks.
arXiv Detail & Related papers (2024-03-16T23:52:25Z) - Temporal Treasure Hunt: Content-based Time Series Retrieval System for
Discovering Insights [34.1973242428317]
Time series data is ubiquitous across various domains such as finance, healthcare, and manufacturing.
The ability to perform Content-based Time Series Retrieval (CTSR) is crucial for identifying unknown time series examples.
We introduce a CTSR benchmark dataset that comprises time series data from a variety of domains.
arXiv Detail & Related papers (2023-11-05T04:12:13Z) - Fully-Connected Spatial-Temporal Graph for Multivariate Time-Series Data [50.84488941336865]
We propose a novel method called Fully- Spatial-Temporal Graph Neural Network (FC-STGNN)
For graph construction, we design a decay graph to connect sensors across all timestamps based on their temporal distances.
For graph convolution, we devise FC graph convolution with a moving-pooling GNN layer to effectively capture the ST dependencies for learning effective representations.
arXiv Detail & Related papers (2023-09-11T08:44:07Z) - Learning summary features of time series for likelihood free inference [93.08098361687722]
We present a data-driven strategy for automatically learning summary features from time series data.
Our results indicate that learning summary features from data can compete and even outperform LFI methods based on hand-crafted values.
arXiv Detail & Related papers (2020-12-04T19:21:37Z) - MTS-CycleGAN: An Adversarial-based Deep Mapping Learning Network for
Multivariate Time Series Domain Adaptation Applied to the Ironmaking Industry [0.0]
This research focuses on translating the specific asset-based historical data (source domain) into data corresponding to one reference asset (target domain)
We propose MTS-CycleGAN, an algorithm for Multivariate Time Series data based on CycleGAN.
Our contribution is the integration in the CycleGAN architecture of a Long Short-Term Memory (LSTM)-based AutoEncoder (AE) for the generator and a stacked LSTM-based discriminator.
arXiv Detail & Related papers (2020-07-15T07:33:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.