Data Augmentation for Time-Series Classification: An Extensive Empirical Study and Comprehensive Survey
- URL: http://arxiv.org/abs/2310.10060v7
- Date: Mon, 02 Jun 2025 15:00:25 GMT
- Title: Data Augmentation for Time-Series Classification: An Extensive Empirical Study and Comprehensive Survey
- Authors: Zijun Gao, Haibao Liu, Lingbo Li,
- Abstract summary: Data Augmentation (DA) has become a critical approach in Time Series Classification (TSC)<n>The current landscape of DA in TSC is plagued with fragmented literature reviews, nebulous methodological challenges, and a dearth of accessible user-oriented tools.<n>This study addresses these challenges through a comprehensive examination of DA methodologies within the TSC domain.
- Score: 4.030910640265943
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data Augmentation (DA) has become a critical approach in Time Series Classification (TSC), primarily for its capacity to expand training datasets, enhance model robustness, introduce diversity, and reduce overfitting. However, the current landscape of DA in TSC is plagued with fragmented literature reviews, nebulous methodological taxonomies, inadequate evaluative measures, and a dearth of accessible and user-oriented tools. This study addresses these challenges through a comprehensive examination of DA methodologies within the TSC domain.Our research began with an extensive literature review spanning a decade, revealing significant gaps in existing surveys and necessitating a detailed analysis of over 100 scholarly articles to identify more than 60 distinct DA techniques. This rigorous review led to the development of a novel taxonomy tailored to the specific needs of DA in TSC, categorizing techniques into five primary categories: Transformation-Based, Pattern-Based, Generative, Decomposition-Based, and Automated Data Augmentation. This taxonomy is intended to guide researchers in selecting appropriate methods with greater clarity. In response to the lack of comprehensive evaluations of foundational DA techniques, we conducted a thorough empirical study, testing nearly 20 DA strategies across 15 diverse datasets representing all types within the UCR time-series repository. Using ResNet and LSTM architectures, we employed a multifaceted evaluation approach, including metrics such as Accuracy, Method Ranking, and Residual Analysis, resulting in a benchmark accuracy of 84.98 +- 16.41% in ResNet and 82.41 +- 18.71% in LSTM. Our investigation underscored the inconsistent efficacies of DA techniques, for instance, methods like RGWs and Random Permutation significantly improved model performance, whereas others, like EMD, were less effective.
Related papers
- Review of Inference-Time Scaling Strategies: Reasoning, Search and RAG [13.772025442106544]
Performance gains of LLMs have historically been driven by scaling up model size and training data.<n>The rapidly diminishing availability of high-quality training data is introducing a fundamental bottleneck.<n>This review systematically surveys the diverse techniques contributing to this new era of inference-time scaling.
arXiv Detail & Related papers (2025-10-12T20:09:07Z) - Modern Deep Learning Approaches for Cricket Shot Classification: A Comprehensive Baseline Study [0.0]
This paper presents the first comprehensive baseline study comparing seven different deep learning approaches for cricket shot classification.<n>We implement and evaluate traditional CNN-LSTM architectures, attention-based models, vision transformers, transfer learning approaches, and modern EfficientNet-GRU combinations.<n>Our modern SOTA approach, combining EfficientNet-B0 with a GRU-based temporal model, achieves 92.25% accuracy.
arXiv Detail & Related papers (2025-10-10T09:32:29Z) - Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning [53.85659415230589]
This paper systematically reviews widely adoptedReinforcement learning techniques.<n>We present clear guidelines for selecting RL techniques tailored to specific setups.<n>We also reveal that a minimalist combination of two techniques can unlock the learning capability of critic-free policies.
arXiv Detail & Related papers (2025-08-11T17:39:45Z) - Anomaly Detection and Generation with Diffusion Models: A Survey [51.61574868316922]
Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing.<n>Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest.<n>This survey aims to guide researchers and practitioners in leveraging DMs for innovative AD solutions across diverse applications.
arXiv Detail & Related papers (2025-06-11T03:29:18Z) - Rebalancing the Scales: A Systematic Mapping Study of Generative Adversarial Networks (GANs) in Addressing Data Imbalance [0.16385815610837165]
Generative Adrial Networks (GANs) showed immense potential as a data preprocessing technique that generates good quality synthetic data.
This study employs a systematic mapping methodology to analyze 3041 papers on GAN-based sampling techniques for imbalanced data sourced from four digital libraries.
Through comprehensive quantitative analysis, this research introduces three categorization mappings as application domains, GAN techniques, and GAN variants used to handle the imbalanced nature of the data.
arXiv Detail & Related papers (2025-02-23T11:03:29Z) - Multi-Class Segmentation of Aortic Branches and Zones in Computed Tomography Angiography: The AortaSeg24 Challenge [55.252714550918824]
AortaSeg24 MICCAI Challenge introduced the first dataset of 100 CTA volumes annotated for 23 clinically relevant aortic branches and zones.
This paper presents the challenge design, dataset details, evaluation metrics, and an in-depth analysis of the top-performing algorithms.
arXiv Detail & Related papers (2025-02-07T21:09:05Z) - Comprehensive Review and Empirical Evaluation of Causal Discovery Algorithms for Numerical Data [3.9523536371670045]
Causal analysis has become an essential component in understanding the underlying causes of phenomena across various fields.
Existing literature on causal discovery algorithms is fragmented, with inconsistent methodologies.
A lack of comprehensive evaluations, i.e., data characteristics are often ignored to be jointly analyzed when benchmarking algorithms.
arXiv Detail & Related papers (2024-07-17T23:47:05Z) - Data Augmentation for Multivariate Time Series Classification: An Experimental Study [1.5390962520179197]
Despite the limited size of these datasets, we achieved classification accuracy improvements in 10 out of 13 datasets using the Rocket and InceptionTime models.
This highlights the essential role of sufficient data in training effective models, paralleling the advancements seen in computer vision.
arXiv Detail & Related papers (2024-06-10T17:58:02Z) - Test-Time Domain Generalization for Face Anti-Spoofing [60.94384914275116]
Face Anti-Spoofing (FAS) is pivotal in safeguarding facial recognition systems against presentation attacks.
We introduce a novel Test-Time Domain Generalization framework for FAS, which leverages the testing data to boost the model's generalizability.
Our method, consisting of Test-Time Style Projection (TTSP) and Diverse Style Shifts Simulation (DSSS), effectively projects the unseen data to the seen domain space.
arXiv Detail & Related papers (2024-03-28T11:50:23Z) - DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation [83.30006900263744]
Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights.
We propose to automatically generate high-quality answer annotations leveraging the code-generation capabilities of LLMs.
Our DACO-RL algorithm is evaluated by human annotators to produce more helpful answers than SFT model in 57.72% cases.
arXiv Detail & Related papers (2024-03-04T22:47:58Z) - Finding Foundation Models for Time Series Classification with a PreText
Task [7.197233473373693]
This paper introduces pre-trained domain foundation models for Time Series Classification.
A key aspect of our methodology is a novel pretext task that spans multiple datasets.
Our experiments on the UCR archive demonstrate that this pre-training strategy significantly outperforms the conventional training approach without pre-training.
arXiv Detail & Related papers (2023-11-24T15:03:55Z) - Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning [50.809769498312434]
We propose a novel dataset pruning method termed as Temporal Dual-Depth Scoring (TDDS)
Our method achieves 54.51% accuracy with only 10% training data, surpassing random selection by 7.83% and other comparison methods by at least 12.69%.
arXiv Detail & Related papers (2023-11-22T03:45:30Z) - Consistency Regularization for Generalizable Source-free Domain
Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset.
Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets.
We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z) - Rethinking Distribution Shifts: Empirical Analysis and Inductive Modeling for Tabular Data [30.518020409197767]
We build an empirical testbed comprising natural shifts across 5 datasets and 60,000 method configurations.
We find $Y|X$-shifts are most prevalent on our testbed, in stark contrast to the heavy focus on $X$ (co)-shifts in the ML literature.
arXiv Detail & Related papers (2023-07-11T14:25:10Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - Training Strategies for Improved Lip-reading [61.661446956793604]
We investigate the performance of state-of-the-art data augmentation approaches, temporal models and other training strategies.
A combination of all the methods results in a classification accuracy of 93.4%, which is an absolute improvement of 4.6% over the current state-of-the-art performance.
An error analysis of the various training strategies reveals that the performance improves by increasing the classification accuracy of hard-to-recognise words.
arXiv Detail & Related papers (2022-09-03T09:38:11Z) - Data Augmentation techniques in time series domain: A survey and
taxonomy [0.20971479389679332]
Deep neural networks used to work with time series heavily depend on the size and consistency of the datasets used in training.
This work systematically reviews the current state-of-the-art in the area to provide an overview of all available algorithms.
The ultimate aim of this study is to provide a summary of the evolution and performance of areas that produce better results to guide future researchers in this field.
arXiv Detail & Related papers (2022-06-25T17:09:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.