Related papers: Adaptive Weighting Scheme for Automatic Time-Series Data Augmentation

Adaptive Weighting Scheme for Automatic Time-Series Data Augmentation

URL: http://arxiv.org/abs/2102.08310v1
Date: Tue, 16 Feb 2021 17:50:51 GMT
Title: Adaptive Weighting Scheme for Automatic Time-Series Data Augmentation
Authors: Elizabeth Fons, Paula Dawson, Xiao-jun Zeng, John Keane, Alexandros Iosifidis
Abstract summary: We present two sample-adaptive automatic weighting schemes for data augmentation. We validate our proposed methods on a large, noisy financial dataset and on time-series datasets from the UCR archive. On the financial dataset, we show that the methods in combination with a trading strategy lead to improvements in annualized returns of over 50$%$, and on the time-series data we outperform state-of-the-art models on over half of the datasets, and achieve similar performance in accuracy on the others.
Score: 79.47771259100674
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data augmentation methods have been shown to be a fundamental technique to improve generalization in tasks such as image, text and audio classification. Recently, automated augmentation methods have led to further improvements on image classification and object detection leading to state-of-the-art performances. Nevertheless, little work has been done on time-series data, an area that could greatly benefit from automated data augmentation given the usually limited size of the datasets. We present two sample-adaptive automatic weighting schemes for data augmentation: the first learns to weight the contribution of the augmented samples to the loss, and the second method selects a subset of transformations based on the ranking of the predicted training loss. We validate our proposed methods on a large, noisy financial dataset and on time-series datasets from the UCR archive. On the financial dataset, we show that the methods in combination with a trading strategy lead to improvements in annualized returns of over 50$\%$, and on the time-series data we outperform state-of-the-art models on over half of the datasets, and achieve similar performance in accuracy on the others.

Related papers

A Time-Series Data Augmentation Model through Diffusion and Transformer Integration [0.6437284704257459]
Deep neural networks typically require large volumes of data for training.<n>We propose a simple and effective method that combines the Diffusion and Transformer models.<n>Using the performance improvement of the model after applying augmented data as a benchmark, this approach shows its capability to produce high-quality augmented data.
arXiv Detail & Related papers (2025-05-01T09:40:45Z)
Towards Data-Efficient Pretraining for Atomic Property Prediction [51.660835328611626]
We show that pretraining on a task-relevant dataset can match or surpass large-scale pretraining. We introduce the Chemical Similarity Index (CSI), a novel metric inspired by computer vision's Fr'echet Inception Distance.
arXiv Detail & Related papers (2025-02-16T11:46:23Z)
Optimizing Pretraining Data Mixtures with LLM-Estimated Utility [52.08428597962423]
Large Language Models improve with increasing amounts of high-quality training data. We find token-counts outperform manual and learned mixes, indicating that simple approaches for dataset size and diversity are surprisingly effective. We propose two complementary approaches: UtiliMax, which extends token-based $200s by incorporating utility estimates from reduced-scale ablations, achieving up to a 10.6x speedup over manual baselines; and Model Estimated Data Utility (MEDU), which leverages LLMs to estimate data utility from small samples, matching ablation-based performance while reducing computational requirements by $simx.
arXiv Detail & Related papers (2025-01-20T21:10:22Z)
CiTrus: Squeezing Extra Performance out of Low-data Bio-signal Transfer Learning [0.36832029288386137]
Transfer learning for bio-signals has recently become an important technique to improve prediction performance on downstream tasks with small bio-signal datasets. We propose a new convolution-transformer hybrid model architecture with masked auto-encoding for low-data bio-signal transfer learning. Our findings indicate that the convolution-only part of our hybrid model can achieve state-of-the-art performance on some low-data downstream tasks.
arXiv Detail & Related papers (2024-12-16T12:15:16Z)
Learning Augmentation Policies from A Model Zoo for Time Series Forecasting [58.66211334969299]
We introduce AutoTSAug, a learnable data augmentation method based on reinforcement learning. By augmenting the marginal samples with a learnable policy, AutoTSAug substantially improves forecasting performance.
arXiv Detail & Related papers (2024-09-10T07:34:19Z)
Data Augmentation for Traffic Classification [54.92823760790628]
Data Augmentation (DA) is a technique widely adopted in Computer Vision (CV) and Natural Language Processing (NLP) tasks. DA has struggled to gain traction in networking contexts, particularly in Traffic Classification (TC) tasks.
arXiv Detail & Related papers (2024-01-19T15:25:09Z)
Financial Time Series Data Augmentation with Generative Adversarial Networks and Extended Intertemporal Return Plots [2.365537081046599]
We apply state-of-the art image-based generative models for the task of data augmentation. We introduce the extended intertemporal return plot (XIRP), a new image representation for time series. Our approach proves to be effective in reducing the return forecast error by 7% on 79% of the financial data sets.
arXiv Detail & Related papers (2022-05-18T13:39:27Z)
Weakly Supervised Change Detection Using Guided Anisotropic Difusion [97.43170678509478]
We propose original ideas that help us to leverage such datasets in the context of change detection. First, we propose the guided anisotropic diffusion (GAD) algorithm, which improves semantic segmentation results. We then show its potential in two weakly-supervised learning strategies tailored for change detection.
arXiv Detail & Related papers (2021-12-31T10:03:47Z)
Exploring the Efficacy of Automatically Generated Counterfactuals for Sentiment Analysis [17.811597734603144]
We propose an approach to automatically generating counterfactual data for data augmentation and explanation. A comprehensive evaluation on several different datasets and using a variety of state-of-the-art benchmarks demonstrate how our approach can achieve significant improvements in model performance.
arXiv Detail & Related papers (2021-06-29T10:27:01Z)
DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences. Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z)
Evaluating data augmentation for financial time series classification [85.38479579398525]
We evaluate several augmentation methods applied to stocks datasets using two state-of-the-art deep learning models. For a relatively small dataset augmentation methods achieve up to $400%$ improvement in risk adjusted return performance. For a larger stock dataset augmentation methods achieve up to $40%$ improvement.
arXiv Detail & Related papers (2020-10-28T17:53:57Z)
Improving the Performance of Fine-Grain Image Classifiers via Generative Data Augmentation [0.5161531917413706]
We develop Data Augmentation from Proficient Pre-Training of Robust Generative Adrial Networks (DAPPER GAN) DAPPER GAN is an ML analytics support tool that automatically generates novel views of training images. We experimentally evaluate this technique on the Stanford Cars dataset, demonstrating improved vehicle make and model classification accuracy.
arXiv Detail & Related papers (2020-08-12T15:29:11Z)
Complex Wavelet SSIM based Image Data Augmentation [0.0]
We look at the MNIST handwritten dataset an image dataset used for digit recognition. We take a detailed look into one of the most popular augmentation techniques used for this data set elastic deformation. We propose to use a similarity measure called Complex Wavelet Structural Similarity Index Measure (CWSSIM) to selectively filter out the irrelevant data.
arXiv Detail & Related papers (2020-07-11T21:11:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.