A Dataset Fusion Algorithm for Generalised Anomaly Detection in
Homogeneous Periodic Time Series Datasets
- URL: http://arxiv.org/abs/2305.08197v1
- Date: Sun, 14 May 2023 16:24:09 GMT
- Title: A Dataset Fusion Algorithm for Generalised Anomaly Detection in
Homogeneous Periodic Time Series Datasets
- Authors: Ayman Elhalwagy and Tatiana Kalganova
- Abstract summary: "Dataset Fusion" is an algorithm for fusing periodic signals from multiple homogeneous datasets into a single dataset.
The proposed approach significantly outperforms conventional training approaches with an Average F1 score of 0.879.
Results show that using only 6.25% of the training data, translating to a 93.7% reduction in computational power, results in a mere 4.04% decrease in performance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The generalisation of Neural Networks (NN) to multiple datasets is often
overlooked in literature due to NNs typically being optimised for specific data
sources. This becomes especially challenging in time-series-based multi-dataset
models due to difficulties in fusing sequential data from different sensors and
collection specifications. In a commercial environment, however, generalisation
can effectively utilise available data and computational power, which is
essential in the context of Green AI, the sustainable development of AI models.
This paper introduces "Dataset Fusion," a novel dataset composition algorithm
for fusing periodic signals from multiple homogeneous datasets into a single
dataset while retaining unique features for generalised anomaly detection. The
proposed approach, tested on a case study of 3-phase current data from 2
different homogeneous Induction Motor (IM) fault datasets using an unsupervised
LSTMCaps NN, significantly outperforms conventional training approaches with an
Average F1 score of 0.879 and effectively generalises across all datasets. The
proposed approach was also tested with varying percentages of the training
data, in line with the principles of Green AI. Results show that using only
6.25\% of the training data, translating to a 93.7\% reduction in computational
power, results in a mere 4.04\% decrease in performance, demonstrating the
advantages of the proposed approach in terms of both performance and
computational efficiency. Moreover, the algorithm's effectiveness under
non-ideal conditions highlights its potential for practical use in real-world
applications.
Related papers
- Approaching Metaheuristic Deep Learning Combos for Automated Data Mining [0.5419570023862531]
This work proposes a means of combining meta-heuristic methods with conventional classifiers and neural networks in order to perform automated data mining.
Experiments on the MNIST dataset for handwritten digit recognition were performed.
It was empirically observed that using a ground truth labeled dataset's validation accuracy is inadequate for correcting labels of other previously unseen data instances.
arXiv Detail & Related papers (2024-10-16T10:28:22Z) - Analysis and Optimization of Wireless Federated Learning with Data
Heterogeneity [72.85248553787538]
This paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation.
We formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE)
Experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
arXiv Detail & Related papers (2023-08-04T04:18:01Z) - Personalized Decentralized Multi-Task Learning Over Dynamic
Communication Graphs [59.96266198512243]
We propose a decentralized and federated learning algorithm for tasks that are positively and negatively correlated.
Our algorithm uses gradients to calculate the correlations among tasks automatically, and dynamically adjusts the communication graph to connect mutually beneficial tasks and isolate those that may negatively impact each other.
We conduct experiments on a synthetic Gaussian dataset and a large-scale celebrity attributes (CelebA) dataset.
arXiv Detail & Related papers (2022-12-21T18:58:24Z) - DRFLM: Distributionally Robust Federated Learning with Inter-client
Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data.
We propose a general framework to solve the above two challenges simultaneously.
We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z) - Generating Data to Mitigate Spurious Correlations in Natural Language
Inference Datasets [27.562256973255728]
Natural language processing models often exploit spurious correlations between task-independent features and labels in datasets to perform well only within the distributions they are trained on.
We propose to tackle this problem by generating a debiased version of a dataset, which can then be used to train a debiased, off-the-shelf model.
Our approach consists of 1) a method for training data generators to generate high-quality, label-consistent data samples; and 2) a filtering mechanism for removing data points that contribute to spurious correlations.
arXiv Detail & Related papers (2022-03-24T09:08:05Z) - Multimodal Data Fusion in High-Dimensional Heterogeneous Datasets via
Generative Models [16.436293069942312]
We are interested in learning probabilistic generative models from high-dimensional heterogeneous data in an unsupervised fashion.
We propose a general framework that combines disparate data types through the exponential family of distributions.
The proposed algorithm is presented in detail for the commonly encountered heterogeneous datasets with real-valued (Gaussian) and categorical (multinomial) features.
arXiv Detail & Related papers (2021-08-27T18:10:31Z) - Towards Synthetic Multivariate Time Series Generation for Flare
Forecasting [5.098461305284216]
One of the limiting factors in training data-driven, rare-event prediction algorithms is the scarcity of the events of interest.
In this study, we explore the usefulness of the conditional generative adversarial network (CGAN) as a means to perform data-informed oversampling.
arXiv Detail & Related papers (2021-05-16T22:23:23Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - Efficient Construction of Nonlinear Models over Normalized Data [21.531781003420573]
We show how it is possible to decompose in a systematic way both for binary joins and for multi-way joins to construct mixture models.
We present algorithms that can conduct the training of the network in a factorized way and offer performance advantages.
arXiv Detail & Related papers (2020-11-23T19:20:03Z) - TadGAN: Time Series Anomaly Detection Using Generative Adversarial
Networks [73.01104041298031]
TadGAN is an unsupervised anomaly detection approach built on Generative Adversarial Networks (GANs)
To capture the temporal correlations of time series, we use LSTM Recurrent Neural Networks as base models for Generators and Critics.
To demonstrate the performance and generalizability of our approach, we test several anomaly scoring techniques and report the best-suited one.
arXiv Detail & Related papers (2020-09-16T15:52:04Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.