Learning summary features of time series for likelihood free inference
- URL: http://arxiv.org/abs/2012.02807v1
- Date: Fri, 4 Dec 2020 19:21:37 GMT
- Title: Learning summary features of time series for likelihood free inference
- Authors: Pedro L. C. Rodrigues, Alexandre Gramfort
- Abstract summary: We present a data-driven strategy for automatically learning summary features from time series data.
Our results indicate that learning summary features from data can compete and even outperform LFI methods based on hand-crafted values.
- Score: 93.08098361687722
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There has been an increasing interest from the scientific community in using
likelihood-free inference (LFI) to determine which parameters of a given
simulator model could best describe a set of experimental data. Despite
exciting recent results and a wide range of possible applications, an important
bottleneck of LFI when applied to time series data is the necessity of defining
a set of summary features, often hand-tailored based on domain knowledge. In
this work, we present a data-driven strategy for automatically learning summary
features from univariate time series and apply it to signals generated from
autoregressive-moving-average (ARMA) models and the Van der Pol Oscillator. Our
results indicate that learning summary features from data can compete and even
outperform LFI methods based on hand-crafted values such as autocorrelation
coefficients even in the linear case.
Related papers
- Beyond Data Scarcity: A Frequency-Driven Framework for Zero-Shot Forecasting [15.431513584239047]
Time series forecasting is critical in numerous real-world applications.
Traditional forecasting techniques struggle when data is scarce or not available at all.
Recent advancements often leverage large-scale foundation models for such tasks.
arXiv Detail & Related papers (2024-11-24T07:44:39Z) - Tackling Data Heterogeneity in Federated Time Series Forecasting [61.021413959988216]
Time series forecasting plays a critical role in various real-world applications, including energy consumption prediction, disease transmission monitoring, and weather forecasting.
Most existing methods rely on a centralized training paradigm, where large amounts of data are collected from distributed devices to a central cloud server.
We propose a novel framework, Fed-TREND, to address data heterogeneity by generating informative synthetic data as auxiliary knowledge carriers.
arXiv Detail & Related papers (2024-11-24T04:56:45Z) - SONNET: Enhancing Time Delay Estimation by Leveraging Simulated Audio [17.811771707446926]
We show that learning based methods can, even based on synthetic data, significantly outperform GCC-PHAT on novel real world data.
We provide our trained model, SONNET, which is runnable in real-time and works on novel data out of the box for many real data applications.
arXiv Detail & Related papers (2024-11-20T10:23:21Z) - A Distribution-Aware Flow-Matching for Generating Unstructured Data for Few-Shot Reinforcement Learning [1.0709300917082865]
We introduce a distribution-aware flow matching, designed to generate synthetic unstructured data tailored for few-shot reinforcement learning (RL) on embedded processors.
We apply feature weighting through Random Forests to prioritize critical data aspects, thereby improving the precision of the generated synthetic data.
Our method provides a stable convergence based on max Q-value while enhancing frame rate by 30% in the very beginning first timestamps.
arXiv Detail & Related papers (2024-09-21T15:50:59Z) - FFAD: A Novel Metric for Assessing Generated Time Series Data Utilizing
Fourier Transform and Auto-encoder [9.103662085683304]
The Fr'echet Inception Distance (FID) serves as the standard metric for evaluating generative models in image synthesis.
This work proposes a novel solution leveraging the Fourier transform and Auto-encoder, termed the Fr'echet Fourier-transform Auto-encoder Distance (FFAD)
Through our experimental results, we showcase the potential of FFAD for effectively distinguishing samples from different classes.
arXiv Detail & Related papers (2024-03-11T10:26:04Z) - From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets.
Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - Can LMs Generalize to Future Data? An Empirical Analysis on Text
Summarization [50.20034493626049]
Recent pre-trained language models (PLMs) achieve promising results in existing abstractive summarization datasets.
Existing summarization benchmarks overlap in time with the standard pre-training corpora and finetuning datasets.
We show that parametric knowledge stored in summarization models significantly affects the faithfulness of the generated summaries on future data.
arXiv Detail & Related papers (2023-05-03T08:08:07Z) - Towards Synthetic Multivariate Time Series Generation for Flare
Forecasting [5.098461305284216]
One of the limiting factors in training data-driven, rare-event prediction algorithms is the scarcity of the events of interest.
In this study, we explore the usefulness of the conditional generative adversarial network (CGAN) as a means to perform data-informed oversampling.
arXiv Detail & Related papers (2021-05-16T22:23:23Z) - Learned Factor Graphs for Inference from Stationary Time Sequences [107.63351413549992]
We propose a framework that combines model-based algorithms and data-driven ML tools for stationary time sequences.
neural networks are developed to separately learn specific components of a factor graph describing the distribution of the time sequence.
We present an inference algorithm based on learned stationary factor graphs, which learns to implement the sum-product scheme from labeled data.
arXiv Detail & Related papers (2020-06-05T07:06:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.