Related papers: A Language Model-Guided Framework for Mining Time Series with Distributional Shifts

A Language Model-Guided Framework for Mining Time Series with Distributional Shifts

URL: http://arxiv.org/abs/2406.05249v1
Date: Fri, 7 Jun 2024 20:21:07 GMT
Title: A Language Model-Guided Framework for Mining Time Series with Distributional Shifts
Authors: Haibei Zhu, Yousef El-Laham, Elizabeth Fons, Svitlana Vyetrenko,
Abstract summary: This paper presents an approach that utilizes large language models and data source interfaces to explore and collect time series datasets. While obtained from external sources, the collected data share critical statistical properties with primary time series datasets. It suggests that collected datasets can effectively supplement existing datasets, especially involving changes in data distribution.
Score: 5.082311792764403
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Effective utilization of time series data is often constrained by the scarcity of data quantity that reflects complex dynamics, especially under the condition of distributional shifts. Existing datasets may not encompass the full range of statistical properties required for robust and comprehensive analysis. And privacy concerns can further limit their accessibility in domains such as finance and healthcare. This paper presents an approach that utilizes large language models and data source interfaces to explore and collect time series datasets. While obtained from external sources, the collected data share critical statistical properties with primary time series datasets, making it possible to model and adapt to various scenarios. This method enlarges the data quantity when the original data is limited or lacks essential properties. It suggests that collected datasets can effectively supplement existing datasets, especially involving changes in data distribution. We demonstrate the effectiveness of the collected datasets through practical examples and show how time series forecasting foundation models fine-tuned on these datasets achieve comparable performance to those models without fine-tuning.

Related papers

Tackling Data Heterogeneity in Federated Time Series Forecasting [61.021413959988216]
Time series forecasting plays a critical role in various real-world applications, including energy consumption prediction, disease transmission monitoring, and weather forecasting. Most existing methods rely on a centralized training paradigm, where large amounts of data are collected from distributed devices to a central cloud server. We propose a novel framework, Fed-TREND, to address data heterogeneity by generating informative synthetic data as auxiliary knowledge carriers.
arXiv Detail & Related papers (2024-11-24T04:56:45Z)
Metadata Matters for Time Series: Informative Forecasting with Transformers [70.38241681764738]
We propose a Metadata-informed Time Series Transformer (MetaTST) for time series forecasting. To tackle the unstructured nature of metadata, MetaTST formalizes them into natural languages by pre-designed templates. A Transformer encoder is employed to communicate series and metadata tokens, which can extend series representations by metadata information.
arXiv Detail & Related papers (2024-10-04T11:37:55Z)
The Data Addition Dilemma [4.869513274920574]
In many machine learning for healthcare tasks, standard datasets are constructed by amassing data across many, often fundamentally dissimilar, sources. But when does adding more data help, and when does it hinder progress on desired model outcomes in real-world settings? We identify this situation as the textitData Addition Dilemma, demonstrating that adding training data in this multi-source scaling context can at times result in reduced overall accuracy, uncertain fairness outcomes, and reduced worst-subgroup performance.
arXiv Detail & Related papers (2024-08-08T01:42:31Z)
Review of Data-centric Time Series Analysis from Sample, Feature, and Period [37.33135447969283]
A good time-series dataset is advantageous for the model's accuracy, robustness, and convergence. The emergence of data-centric AI represents a shift in the landscape from model refinement to prioritizing data quality. We systematically review different data-centric methods in time series analysis, covering a wide range of research topics.
arXiv Detail & Related papers (2024-04-24T00:34:44Z)
EXPRTS: Exploring and Probing the Robustness of Time Series Forecasting Models [1.23187154417297]
We develop an interpretable and simple framework for generating time series.<n>Our method combines time-series decompositions with analytic functions, and is able to generate time series with characteristics matching both in- and out-of-distribution data.<n>We show how our framework can generate meaningful OOD time series that improve model robustness.
arXiv Detail & Related papers (2024-03-06T07:34:47Z)
Pushing the Limits of Pre-training for Time Series Forecasting in the CloudOps Domain [54.67888148566323]
We introduce three large-scale time series forecasting datasets from the cloud operations domain. We show it is a strong zero-shot baseline and benefits from further scaling, both in model and dataset size. Accompanying these datasets and results is a suite of comprehensive benchmark results comparing classical and deep learning baselines to our pre-trained method.
arXiv Detail & Related papers (2023-10-08T08:09:51Z)
Development of a Neural Network-based Method for Improved Imputation of Missing Values in Time Series Data by Repurposing DataWig [1.8719295298860394]
Missing values in time series data occur often and present obstacles to successful analysis, thus they need to be filled with alternative values, a process called imputation. Although various approaches have been attempted for robust imputation of time series data, even the most advanced methods still face challenges. I developed tsDataWig (time-series DataWig) by modifying DataWig, a neural network-based method that possesses the capacity to process large datasets. Unlike the original DataWig, tsDataWig can directly handle values of time variables and impute missing values in complex time
arXiv Detail & Related papers (2023-08-18T15:53:40Z)
Data-SUITE: Data-centric identification of in-distribution incongruous examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data. We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z)
TimeVAE: A Variational Auto-Encoder for Multivariate Time Series Generation [6.824692201913679]
We propose a novel architecture for synthetically generating time-series data with the use of Variversaational Auto-Encoders (VAEs) The proposed architecture has several distinct properties: interpretability, ability to encode domain knowledge, and reduced training times.
arXiv Detail & Related papers (2021-11-15T21:42:14Z)
PIETS: Parallelised Irregularity Encoders for Forecasting with Heterogeneous Time-Series [5.911865723926626]
Heterogeneity and irregularity of multi-source data sets present a significant challenge to time-series analysis. In this work, we design a novel architecture, PIETS, to model heterogeneous time-series. We show that PIETS is able to effectively model heterogeneous temporal data and outperforms other state-of-the-art approaches in the prediction task.
arXiv Detail & Related papers (2021-09-30T20:01:19Z)
Learning summary features of time series for likelihood free inference [93.08098361687722]
We present a data-driven strategy for automatically learning summary features from time series data. Our results indicate that learning summary features from data can compete and even outperform LFI methods based on hand-crafted values.
arXiv Detail & Related papers (2020-12-04T19:21:37Z)
DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a Trained Classifier [58.979104709647295]
We bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a trained network. We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples. We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance.
arXiv Detail & Related papers (2019-12-27T02:05:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.