Measuring Time-Series Dataset Similarity using Wasserstein Distance
- URL: http://arxiv.org/abs/2507.22189v1
- Date: Tue, 29 Jul 2025 19:33:10 GMT
- Title: Measuring Time-Series Dataset Similarity using Wasserstein Distance
- Authors: Hongjie Chen, Akshay Mehra, Josh Kimball, Ryan A. Rossi,
- Abstract summary: A time-series dataset similarity measure aids research in multiple ways, including model selection, finetuning, and visualization.<n>We propose a distribution-based method to measure time-series dataset similarity by leveraging the Wasserstein distance.
- Score: 21.101558938915634
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The emergence of time-series foundation model research elevates the growing need to measure the (dis)similarity of time-series datasets. A time-series dataset similarity measure aids research in multiple ways, including model selection, finetuning, and visualization. In this paper, we propose a distribution-based method to measure time-series dataset similarity by leveraging the Wasserstein distance. We consider a time-series dataset an empirical instantiation of an underlying multivariate normal distribution (MVN). The similarity between two time-series datasets is thus computed as the Wasserstein distance between their corresponding MVNs. Comprehensive experiments and visualization show the effectiveness of our approach. Specifically, we show how the Wasserstein distance helps identify similar time-series datasets and facilitates inference performance estimation of foundation models in both out-of-distribution and transfer learning evaluation, with high correlations between our proposed measure and the inference loss (>0.60).
Related papers
- Bayesian temporal biclustering with applications to multi-subject neuroscience studies [6.515516311120015]
We propose a Bayesian model for temporal biclustering featuring nested partitions, where a time-invariant partition of subjects induces a time-varying partition of measurements.
Our approach allows for data-driven determination of the number of subject and measurement clusters as well as estimation of the number and location of changepoints in measurement partitions.
arXiv Detail & Related papers (2024-06-24T20:41:37Z) - Robust Detection of Lead-Lag Relationships in Lagged Multi-Factor Models [61.10851158749843]
Key insights can be obtained by discovering lead-lag relationships inherent in the data.
We develop a clustering-driven methodology for robust detection of lead-lag relationships in lagged multi-factor models.
arXiv Detail & Related papers (2023-05-11T10:30:35Z) - Exogenous Data in Forecasting: FARM -- A New Measure for Relevance
Evaluation [62.997667081978825]
We introduce a new approach named FARM - Forward Relevance Aligned Metric.
Our forward method relies on an angular measure that compares changes in subsequent data points to align time-warped series.
As a first validation step, we present the application of our FARM approach to synthetic but representative signals.
arXiv Detail & Related papers (2023-04-21T15:22:33Z) - Wasserstein multivariate auto-regressive models for modeling distributional time series [0.0]
This paper is focused on the statistical analysis of data consisting of a collection of multiple series of probability measures.<n>By modeling these time-dependent probability measures as random objects in the Wasserstein space, we propose a new auto-regressive model.<n>Results on the existence, uniqueness and stationarity of the solution of such a model are provided.
arXiv Detail & Related papers (2022-07-12T10:18:36Z) - TACTiS: Transformer-Attentional Copulas for Time Series [76.71406465526454]
estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance.
We propose a versatile method that estimates joint distributions using an attention-based decoder.
We show that our model produces state-of-the-art predictions on several real-world datasets.
arXiv Detail & Related papers (2022-02-07T21:37:29Z) - Towards Similarity-Aware Time-Series Classification [51.2400839966489]
We study time-series classification (TSC), a fundamental task of time-series data mining.
We propose Similarity-Aware Time-Series Classification (SimTSC), a framework that models similarity information with graph neural networks (GNNs)
arXiv Detail & Related papers (2022-01-05T02:14:57Z) - Cluster-and-Conquer: A Framework For Time-Series Forecasting [94.63501563413725]
We propose a three-stage framework for forecasting high-dimensional time-series data.
Our framework is highly general, allowing for any time-series forecasting and clustering method to be used in each step.
When instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets.
arXiv Detail & Related papers (2021-10-26T20:41:19Z) - Kernel distance measures for time series, random fields and other
structured data [71.61147615789537]
kdiff is a novel kernel-based measure for estimating distances between instances of structured data.
It accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution.
Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems.
arXiv Detail & Related papers (2021-09-29T22:54:17Z) - Instance-wise Graph-based Framework for Multivariate Time Series
Forecasting [69.38716332931986]
We propose a simple yet efficient instance-wise graph-based framework to utilize the inter-dependencies of different variables at different time stamps.
The key idea of our framework is aggregating information from the historical time series of different variables to the current time series that we need to forecast.
arXiv Detail & Related papers (2021-09-14T07:38:35Z) - Free congruence: an exploration of expanded similarity measures for time
series data [0.0]
Time series similarity measures are highly relevant in a wide range of emerging applications including training machine learning models, classification, and predictive modeling.
Standard similarity measures for time series most often involve point-to-point distance measures including Euclidean distance and Dynamic Time Warping.
arXiv Detail & Related papers (2021-01-17T23:34:55Z) - Multivariate Time-series Anomaly Detection via Graph Attention Network [27.12694738711663]
Anomaly detection on multivariate time-series is of great importance in both data mining research and industrial applications.
One major limitation is that they do not capture the relationships between different time-series explicitly.
We propose a novel self-supervised framework for multivariate time-series anomaly detection to address this issue.
arXiv Detail & Related papers (2020-09-04T07:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.