A Language Model-Guided Framework for Mining Time Series with Distributional Shifts
- URL: http://arxiv.org/abs/2406.05249v1
- Date: Fri, 7 Jun 2024 20:21:07 GMT
- Title: A Language Model-Guided Framework for Mining Time Series with Distributional Shifts
- Authors: Haibei Zhu, Yousef El-Laham, Elizabeth Fons, Svitlana Vyetrenko,
- Abstract summary: This paper presents an approach that utilizes large language models and data source interfaces to explore and collect time series datasets.
While obtained from external sources, the collected data share critical statistical properties with primary time series datasets.
It suggests that collected datasets can effectively supplement existing datasets, especially involving changes in data distribution.
- Score: 5.082311792764403
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Effective utilization of time series data is often constrained by the scarcity of data quantity that reflects complex dynamics, especially under the condition of distributional shifts. Existing datasets may not encompass the full range of statistical properties required for robust and comprehensive analysis. And privacy concerns can further limit their accessibility in domains such as finance and healthcare. This paper presents an approach that utilizes large language models and data source interfaces to explore and collect time series datasets. While obtained from external sources, the collected data share critical statistical properties with primary time series datasets, making it possible to model and adapt to various scenarios. This method enlarges the data quantity when the original data is limited or lacks essential properties. It suggests that collected datasets can effectively supplement existing datasets, especially involving changes in data distribution. We demonstrate the effectiveness of the collected datasets through practical examples and show how time series forecasting foundation models fine-tuned on these datasets achieve comparable performance to those models without fine-tuning.
Related papers
- Review of Data-centric Time Series Analysis from Sample, Feature, and Period [37.33135447969283]
A good time-series dataset is advantageous for the model's accuracy, robustness, and convergence.
The emergence of data-centric AI represents a shift in the landscape from model refinement to prioritizing data quality.
We systematically review different data-centric methods in time series analysis, covering a wide range of research topics.
arXiv Detail & Related papers (2024-04-24T00:34:44Z) - Retrieval-Augmented Data Augmentation for Low-Resource Domain Tasks [66.87070857705994]
In low-resource settings, the amount of seed data samples to use for data augmentation is very small.
We propose a novel method that augments training data by incorporating a wealth of examples from other datasets.
This approach can ensure that the generated data is not only relevant but also more diverse than what could be achieved using the limited seed data alone.
arXiv Detail & Related papers (2024-02-21T02:45:46Z) - Pushing the Limits of Pre-training for Time Series Forecasting in the
CloudOps Domain [54.67888148566323]
We introduce three large-scale time series forecasting datasets from the cloud operations domain.
We show it is a strong zero-shot baseline and benefits from further scaling, both in model and dataset size.
Accompanying these datasets and results is a suite of comprehensive benchmark results comparing classical and deep learning baselines to our pre-trained method.
arXiv Detail & Related papers (2023-10-08T08:09:51Z) - Development of a Neural Network-based Method for Improved Imputation of
Missing Values in Time Series Data by Repurposing DataWig [1.8719295298860394]
Missing values in time series data occur often and present obstacles to successful analysis, thus they need to be filled with alternative values, a process called imputation.
Although various approaches have been attempted for robust imputation of time series data, even the most advanced methods still face challenges.
I developed tsDataWig (time-series DataWig) by modifying DataWig, a neural network-based method that possesses the capacity to process large datasets.
Unlike the original DataWig, tsDataWig can directly handle values of time variables and impute missing values in complex time
arXiv Detail & Related papers (2023-08-18T15:53:40Z) - Data-SUITE: Data-centric identification of in-distribution incongruous
examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data.
We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z) - TACTiS: Transformer-Attentional Copulas for Time Series [76.71406465526454]
estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance.
We propose a versatile method that estimates joint distributions using an attention-based decoder.
We show that our model produces state-of-the-art predictions on several real-world datasets.
arXiv Detail & Related papers (2022-02-07T21:37:29Z) - TimeVAE: A Variational Auto-Encoder for Multivariate Time Series
Generation [6.824692201913679]
We propose a novel architecture for synthetically generating time-series data with the use of Variversaational Auto-Encoders (VAEs)
The proposed architecture has several distinct properties: interpretability, ability to encode domain knowledge, and reduced training times.
arXiv Detail & Related papers (2021-11-15T21:42:14Z) - PIETS: Parallelised Irregularity Encoders for Forecasting with
Heterogeneous Time-Series [5.911865723926626]
Heterogeneity and irregularity of multi-source data sets present a significant challenge to time-series analysis.
In this work, we design a novel architecture, PIETS, to model heterogeneous time-series.
We show that PIETS is able to effectively model heterogeneous temporal data and outperforms other state-of-the-art approaches in the prediction task.
arXiv Detail & Related papers (2021-09-30T20:01:19Z) - Learning summary features of time series for likelihood free inference [93.08098361687722]
We present a data-driven strategy for automatically learning summary features from time series data.
Our results indicate that learning summary features from data can compete and even outperform LFI methods based on hand-crafted values.
arXiv Detail & Related papers (2020-12-04T19:21:37Z) - On the Composition and Limitations of Publicly Available COVID-19 X-Ray
Imaging Datasets [0.0]
Data scarcity, mismatch between training and target population, group imbalance, and lack of documentation are important sources of bias.
This paper presents an overview of the currently public available COVID-19 chest X-ray datasets.
arXiv Detail & Related papers (2020-08-26T14:16:01Z) - DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a
Trained Classifier [58.979104709647295]
We bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a trained network.
We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples.
We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance.
arXiv Detail & Related papers (2019-12-27T02:05:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.