An Empirical Evaluation of Time-Series Feature Sets
- URL: http://arxiv.org/abs/2110.10914v1
- Date: Thu, 21 Oct 2021 06:06:46 GMT
- Title: An Empirical Evaluation of Time-Series Feature Sets
- Authors: Trent Henderson, Ben D. Fulcher
- Abstract summary: Seven different feature sets can be used to solve time-series problems with features.
Here we compare these sets on computational speed, assess the redundancy of features contained in each, and evaluate the overlap and redundancy between them.
We find that feature sets vary across three orders of magnitude in their computation time per feature on a laptop for a 1000-sample series.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Solving time-series problems with features has been rising in popularity due
to the availability of software for feature extraction. Feature-based
time-series analysis can now be performed using many different feature sets,
including hctsa (7730 features: Matlab), feasts (42 features: R), tsfeatures
(63 features: R), Kats (40 features: Python), tsfresh (up to 1558 features:
Python), TSFEL (390 features: Python), and the C-coded catch22 (22 features:
Matlab, R, Python, and Julia). There is substantial overlap in the types of
methods included in these sets (e.g., properties of the autocorrelation
function and Fourier power spectrum), but they are yet to be systematically
compared. Here we compare these seven sets on computational speed, assess the
redundancy of features contained in each, and evaluate the overlap and
redundancy between them. We take an empirical approach to feature similarity
based on outputs across a diverse set of real-world and simulated time series.
We find that feature sets vary across three orders of magnitude in their
computation time per feature on a laptop for a 1000-sample series, from the
fastest sets catch22 and TSFEL (~0.1ms per feature) to tsfeatures (~3s per
feature). Using PCA to evaluate feature redundancy within each set, we find the
highest within-set redundancy for TSFEL and tsfresh. For example, in TSFEL, 90%
of the variance across 390 features can be captured with just four PCs.
Finally, we introduce a metric for quantifying overlap between pairs of feature
sets, which indicates substantial overlap. We found that the largest feature
set, hctsa, is the most comprehensive, and that tsfresh is the most
distinctive, due to its incorporation of many low-level Fourier coefficients.
Our results provide empirical understanding of the differences between existing
feature sets, information that can be used to better tailor feature sets to
their applications.
Related papers
- Analyzing categorical time series with the R package ctsfeatures [0.0]
The R package ctsfeatures offers users a set of useful tools for analyzing categorical time series.
The output of some functions can be employed to perform traditional machine learning tasks including clustering, classification and outlier detection.
arXiv Detail & Related papers (2023-04-24T16:16:56Z) - FRANS: Automatic Feature Extraction for Time Series Forecasting [2.3226893628361682]
We develop an autonomous Feature Retrieving Autoregressive Network for Static features that does not require domain knowledge.
Our results show that our features lead to improvement in accuracy in most situations.
arXiv Detail & Related papers (2022-09-15T03:14:59Z) - Feature-Based Time-Series Analysis in R using the theft Package [0.0]
Many open-source software packages for computing sets of time-series features exist across multiple programming languages.
Here we introduce a solution to these issues in an R software package called theft.
arXiv Detail & Related papers (2022-08-12T07:29:29Z) - HyperTime: Implicit Neural Representation for Time Series [131.57172578210256]
Implicit neural representations (INRs) have recently emerged as a powerful tool that provides an accurate and resolution-independent encoding of data.
In this paper, we analyze the representation of time series using INRs, comparing different activation functions in terms of reconstruction accuracy and training convergence speed.
We propose a hypernetwork architecture that leverages INRs to learn a compressed latent representation of an entire time series dataset.
arXiv Detail & Related papers (2022-08-11T14:05:51Z) - Triformer: Triangular, Variable-Specific Attentions for Long Sequence
Multivariate Time Series Forecasting--Full Version [50.43914511877446]
We propose a triangular, variable-specific attention to ensure high efficiency and accuracy.
We show that Triformer outperforms state-of-the-art methods w.r.t. both accuracy and efficiency.
arXiv Detail & Related papers (2022-04-28T20:41:49Z) - Novel Features for Time Series Analysis: A Complex Networks Approach [62.997667081978825]
Time series data are ubiquitous in several domains as climate, economics and health care.
Recent conceptual approach relies on time series mapping to complex networks.
Network analysis can be used to characterize different types of time series.
arXiv Detail & Related papers (2021-10-11T13:46:28Z) - Temporal Dependencies in Feature Importance for Time Series Predictions [4.082348823209183]
We propose WinIT, a framework for evaluating feature importance in time series prediction settings.
We demonstrate how the solution improves the appropriate attribution of features within time steps.
WinIT achieves 2.47x better performance than FIT and other feature importance methods on real-world clinical MIMIC-mortality task.
arXiv Detail & Related papers (2021-07-29T20:31:03Z) - Learning Aggregation Functions [78.47770735205134]
We introduce LAF (Learning Aggregation Functions), a learnable aggregator for sets of arbitrary cardinality.
We report experiments on semi-synthetic and real data showing that LAF outperforms state-of-the-art sum- (max-) decomposition architectures.
arXiv Detail & Related papers (2020-12-15T18:28:53Z) - Infinite Feature Selection: A Graph-based Feature Filtering Approach [78.63188057505012]
We propose a filtering feature selection framework that considers subsets of features as paths in a graph.
Going to infinite allows to constrain the computational complexity of the selection process.
We show that Inf-FS behaves better in almost any situation, that is, when the number of features to keep are fixed a priori.
arXiv Detail & Related papers (2020-06-15T07:20:40Z) - Supervised Feature Subset Selection and Feature Ranking for Multivariate
Time Series without Feature Extraction [78.84356269545157]
We introduce supervised feature ranking and feature subset selection algorithms for MTS classification.
Unlike most existing supervised/unsupervised feature selection algorithms for MTS our techniques do not require a feature extraction step to generate a one-dimensional feature vector from the time series.
arXiv Detail & Related papers (2020-05-01T07:46:29Z) - A Deep Structural Model for Analyzing Correlated Multivariate Time
Series [11.009809732645888]
We present a deep learning structural time series model which can handle correlated multivariate time series input.
The model explicitly learns/extracts the trend, seasonality, and event components.
We compare our model with several state-of-the-art methods through a comprehensive set of experiments on a variety of time series data sets.
arXiv Detail & Related papers (2020-01-02T18:48:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.