Adaptive Weighting Scheme for Automatic Time-Series Data Augmentation
- URL: http://arxiv.org/abs/2102.08310v1
- Date: Tue, 16 Feb 2021 17:50:51 GMT
- Title: Adaptive Weighting Scheme for Automatic Time-Series Data Augmentation
- Authors: Elizabeth Fons, Paula Dawson, Xiao-jun Zeng, John Keane, Alexandros
Iosifidis
- Abstract summary: We present two sample-adaptive automatic weighting schemes for data augmentation.
We validate our proposed methods on a large, noisy financial dataset and on time-series datasets from the UCR archive.
On the financial dataset, we show that the methods in combination with a trading strategy lead to improvements in annualized returns of over 50$%$, and on the time-series data we outperform state-of-the-art models on over half of the datasets, and achieve similar performance in accuracy on the others.
- Score: 79.47771259100674
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data augmentation methods have been shown to be a fundamental technique to
improve generalization in tasks such as image, text and audio classification.
Recently, automated augmentation methods have led to further improvements on
image classification and object detection leading to state-of-the-art
performances. Nevertheless, little work has been done on time-series data, an
area that could greatly benefit from automated data augmentation given the
usually limited size of the datasets. We present two sample-adaptive automatic
weighting schemes for data augmentation: the first learns to weight the
contribution of the augmented samples to the loss, and the second method
selects a subset of transformations based on the ranking of the predicted
training loss. We validate our proposed methods on a large, noisy financial
dataset and on time-series datasets from the UCR archive. On the financial
dataset, we show that the methods in combination with a trading strategy lead
to improvements in annualized returns of over 50$\%$, and on the time-series
data we outperform state-of-the-art models on over half of the datasets, and
achieve similar performance in accuracy on the others.
Related papers
- Towards Data-Efficient Pretraining for Atomic Property Prediction [51.660835328611626]
We show that pretraining on a task-relevant dataset can match or surpass large-scale pretraining.
We introduce the Chemical Similarity Index (CSI), a novel metric inspired by computer vision's Fr'echet Inception Distance.
arXiv Detail & Related papers (2025-02-16T11:46:23Z) - Optimizing Pretraining Data Mixtures with LLM-Estimated Utility [52.08428597962423]
Large Language Models improve with increasing amounts of high-quality training data.
We find token-counts outperform manual and learned mixes, indicating that simple approaches for dataset size and diversity are surprisingly effective.
We propose two complementary approaches: UtiliMax, which extends token-based $200s by incorporating utility estimates from reduced-scale ablations, achieving up to a 10.6x speedup over manual baselines; and Model Estimated Data Utility (MEDU), which leverages LLMs to estimate data utility from small samples, matching ablation-based performance while reducing computational requirements by $simx.
arXiv Detail & Related papers (2025-01-20T21:10:22Z) - CiTrus: Squeezing Extra Performance out of Low-data Bio-signal Transfer Learning [0.36832029288386137]
Transfer learning for bio-signals has recently become an important technique to improve prediction performance on downstream tasks with small bio-signal datasets.
We propose a new convolution-transformer hybrid model architecture with masked auto-encoding for low-data bio-signal transfer learning.
Our findings indicate that the convolution-only part of our hybrid model can achieve state-of-the-art performance on some low-data downstream tasks.
arXiv Detail & Related papers (2024-12-16T12:15:16Z) - Data Augmentation for Traffic Classification [54.92823760790628]
Data Augmentation (DA) is a technique widely adopted in Computer Vision (CV) and Natural Language Processing (NLP) tasks.
DA has struggled to gain traction in networking contexts, particularly in Traffic Classification (TC) tasks.
arXiv Detail & Related papers (2024-01-19T15:25:09Z) - Financial Time Series Data Augmentation with Generative Adversarial
Networks and Extended Intertemporal Return Plots [2.365537081046599]
We apply state-of-the art image-based generative models for the task of data augmentation.
We introduce the extended intertemporal return plot (XIRP), a new image representation for time series.
Our approach proves to be effective in reducing the return forecast error by 7% on 79% of the financial data sets.
arXiv Detail & Related papers (2022-05-18T13:39:27Z) - Weakly Supervised Change Detection Using Guided Anisotropic Difusion [97.43170678509478]
We propose original ideas that help us to leverage such datasets in the context of change detection.
First, we propose the guided anisotropic diffusion (GAD) algorithm, which improves semantic segmentation results.
We then show its potential in two weakly-supervised learning strategies tailored for change detection.
arXiv Detail & Related papers (2021-12-31T10:03:47Z) - Evaluating data augmentation for financial time series classification [85.38479579398525]
We evaluate several augmentation methods applied to stocks datasets using two state-of-the-art deep learning models.
For a relatively small dataset augmentation methods achieve up to $400%$ improvement in risk adjusted return performance.
For a larger stock dataset augmentation methods achieve up to $40%$ improvement.
arXiv Detail & Related papers (2020-10-28T17:53:57Z) - Improving the Performance of Fine-Grain Image Classifiers via Generative
Data Augmentation [0.5161531917413706]
We develop Data Augmentation from Proficient Pre-Training of Robust Generative Adrial Networks (DAPPER GAN)
DAPPER GAN is an ML analytics support tool that automatically generates novel views of training images.
We experimentally evaluate this technique on the Stanford Cars dataset, demonstrating improved vehicle make and model classification accuracy.
arXiv Detail & Related papers (2020-08-12T15:29:11Z) - Complex Wavelet SSIM based Image Data Augmentation [0.0]
We look at the MNIST handwritten dataset an image dataset used for digit recognition.
We take a detailed look into one of the most popular augmentation techniques used for this data set elastic deformation.
We propose to use a similarity measure called Complex Wavelet Structural Similarity Index Measure (CWSSIM) to selectively filter out the irrelevant data.
arXiv Detail & Related papers (2020-07-11T21:11:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.