Back to Basics: A Sanity Check on Modern Time Series Classification
Algorithms
- URL: http://arxiv.org/abs/2308.07886v1
- Date: Tue, 15 Aug 2023 17:23:18 GMT
- Title: Back to Basics: A Sanity Check on Modern Time Series Classification
Algorithms
- Authors: Bhaskar Dhariyal, Thach Le Nguyen, Georgiana Ifrim
- Abstract summary: In the current fast-paced development of new classifiers, taking a step back and performing simple baseline checks is essential.
These checks are often overlooked, as researchers are focused on establishing new state-of-the-art results, developing scalable algorithms, and making models explainable.
- Score: 5.225544155289783
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The state-of-the-art in time series classification has come a long way, from
the 1NN-DTW algorithm to the ROCKET family of classifiers. However, in the
current fast-paced development of new classifiers, taking a step back and
performing simple baseline checks is essential. These checks are often
overlooked, as researchers are focused on establishing new state-of-the-art
results, developing scalable algorithms, and making models explainable.
Nevertheless, there are many datasets that look like time series at first
glance, but classic algorithms such as tabular methods with no time ordering
may perform better on such problems. For example, for spectroscopy datasets,
tabular methods tend to significantly outperform recent time series methods. In
this study, we compare the performance of tabular models using classic machine
learning approaches (e.g., Ridge, LDA, RandomForest) with the ROCKET family of
classifiers (e.g., Rocket, MiniRocket, MultiRocket). Tabular models are simple
and very efficient, while the ROCKET family of classifiers are more complex and
have state-of-the-art accuracy and efficiency among recent time series
classifiers. We find that tabular models outperform the ROCKET family of
classifiers on approximately 19% of univariate and 28% of multivariate datasets
in the UCR/UEA benchmark and achieve accuracy within 10 percentage points on
about 50% of datasets. Our results suggest that it is important to consider
simple tabular models as baselines when developing time series classifiers.
These models are very fast, can be as effective as more complex methods and may
be easier to understand and deploy.
Related papers
- Chronos: Learning the Language of Time Series [79.38691251254173]
Chronos is a framework for pretrained probabilistic time series models.
We show that Chronos models can leverage time series data from diverse domains to improve zero-shot accuracy on unseen forecasting tasks.
arXiv Detail & Related papers (2024-03-12T16:53:54Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Detach-ROCKET: Sequential feature selection for time series classification with random convolutional kernels [0.7499722271664144]
We introduce Sequential Feature Detachment (SFD) to identify and prune non-essential features in ROCKET-based models.
SFD can produce models with better test accuracy using only 10% of the original features.
We also present an end-to-end procedure for determining an optimal balance between the number of features and model accuracy.
arXiv Detail & Related papers (2023-09-25T20:24:36Z) - Composable Core-sets for Diversity Approximation on Multi-Dataset
Streams [4.765131728094872]
Composable core-sets are core-sets with the property that subsets of the core set can be unioned together to obtain an approximation for the original data.
We introduce a core-set construction algorithm for constructing composable core-sets to summarize streamed data for use in active learning environments.
arXiv Detail & Related papers (2023-08-10T23:24:51Z) - Class-Incremental Learning with Strong Pre-trained Models [97.84755144148535]
Class-incremental learning (CIL) has been widely studied under the setting of starting from a small number of classes (base classes)
We explore an understudied real-world setting of CIL that starts with a strong model pre-trained on a large number of base classes.
Our proposed method is robust and generalizes to all analyzed CIL settings.
arXiv Detail & Related papers (2022-04-07T17:58:07Z) - The FreshPRINCE: A Simple Transformation Based Pipeline Time Series
Classifier [0.0]
We look at whether the complexity of the algorithms considered state of the art is really necessary.
Many times the first approach suggested is a simple pipeline of summary statistics or other time series feature extraction approaches.
We test these approaches on the UCR time series dataset archive, looking to see if TSC literature has overlooked the effectiveness of these approaches.
arXiv Detail & Related papers (2022-01-28T11:23:58Z) - Robust Augmentation for Multivariate Time Series Classification [20.38907456958682]
We show that the simple methods of cutout, cutmix, mixup, and window warp improve the robustness and overall performance.
We show that the InceptionTime network with augmentation improves accuracy by 1% to 45% in 18 different datasets.
arXiv Detail & Related papers (2022-01-27T18:57:49Z) - Towards Similarity-Aware Time-Series Classification [51.2400839966489]
We study time-series classification (TSC), a fundamental task of time-series data mining.
We propose Similarity-Aware Time-Series Classification (SimTSC), a framework that models similarity information with graph neural networks (GNNs)
arXiv Detail & Related papers (2022-01-05T02:14:57Z) - Improving Calibration for Long-Tailed Recognition [68.32848696795519]
We propose two methods to improve calibration and performance in such scenarios.
For dataset bias due to different samplers, we propose shifted batch normalization.
Our proposed methods set new records on multiple popular long-tailed recognition benchmark datasets.
arXiv Detail & Related papers (2021-04-01T13:55:21Z) - Interpretable Time Series Classification using Linear Models and
Multi-resolution Multi-domain Symbolic Representations [6.6147550436077776]
We propose new time series classification algorithms to address gaps in current approaches.
Our approach is based on symbolic representations of time series, efficient sequence mining algorithms and linear classification models.
Our models are as accurate as deep learning models but are more efficient regarding running time and memory, can work with variable-length time series and can be interpreted by highlighting the discriminative symbolic features on the original time series.
arXiv Detail & Related papers (2020-05-31T15:32:08Z) - Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates.
The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature.
It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.