Detection of Anomalies in a Time Series Data using InfluxDB and Python
- URL: http://arxiv.org/abs/2012.08439v1
- Date: Tue, 15 Dec 2020 17:27:39 GMT
- Title: Detection of Anomalies in a Time Series Data using InfluxDB and Python
- Authors: Tochukwu John Anih, Chika Amadi Bede, and Chima Festus Umeokpala
- Abstract summary: This paper demonstrates data cleaning and preparation for time-series data.
It further proposes cost-sensitive machine learning algorithms as a solution to detect anomalous data points in time-series data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Analysis of water and environmental data is an important aspect of many
intelligent water and environmental system applications where inference from
such analysis plays a significant role in decision making. Quite often these
data that are collected through sensible sensors can be anomalous due to
different reasons such as systems breakdown, malfunctioning of sensor
detectors, and more. Regardless of their root causes, such data severely affect
the results of the subsequent analysis. This paper demonstrates data cleaning
and preparation for time-series data and further proposes cost-sensitive
machine learning algorithms as a solution to detect anomalous data points in
time-series data. The following models: Logistic Regression, Random Forest,
Support Vector Machines have been modified to support the cost-sensitive
learning which penalizes misclassified samples thereby minimizing the total
misclassification cost. Our results showed that Random Forest outperformed the
rest of the models at predicting the positive class (i.e anomalies). Applying
predictive model improvement techniques like data oversampling seems to provide
little or no improvement to the Random Forest model. Interestingly, with
recursive feature elimination, we achieved a better model performance thereby
reducing the dimensions in the data. Finally, with Influxdb and Kapacitor the
data was ingested and streamed to generate new data points to further evaluate
the model performance on unseen data, this will allow for early recognition of
undesirable changes in the drinking water quality and will enable the water
supply companies to rectify on a timely basis whatever undesirable changes
abound.
Related papers
- DUPRE: Data Utility Prediction for Efficient Data Valuation [49.60564885180563]
Cooperative game theory-based data valuation, such as Data Shapley, requires evaluating the data utility and retraining the ML model for multiple data subsets.
Our framework, textttDUPRE, takes an alternative yet complementary approach that reduces the cost per subset evaluation by predicting data utilities instead of evaluating them by model retraining.
Specifically, given the evaluated data utilities of some data subsets, textttDUPRE fits a emphGaussian process (GP) regression model to predict the utility of every other data subset.
arXiv Detail & Related papers (2025-02-22T08:53:39Z) - CleanSurvival: Automated data preprocessing for time-to-event models using reinforcement learning [0.0]
Data preprocessing is a critical yet frequently neglected aspect of machine learning.
CleanSurvival is a reinforcement-learning-based solution for optimizing preprocessing pipelines.
It can handle continuous and categorical variables, using Q-learning to select which combination of data imputation, outlier detection and feature extraction techniques achieves optimal performance.
arXiv Detail & Related papers (2025-02-06T10:33:37Z) - DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - An Automated Machine Learning Approach for Detecting Anomalous Peak
Patterns in Time Series Data from a Research Watershed in the Northeastern
United States Critical Zone [3.1747517745997014]
This paper presents an automated machine learning framework designed to assist hydrologists in detecting anomalies in time series data generated by sensors in a research watershed in the northeastern United States critical zone.
The framework specifically focuses on identifying peak-pattern anomalies, which may arise from sensor malfunctions or natural phenomena.
arXiv Detail & Related papers (2023-09-14T19:07:50Z) - Exploring the Effectiveness of Dataset Synthesis: An application of
Apple Detection in Orchards [68.95806641664713]
We explore the usability of Stable Diffusion 2.1-base for generating synthetic datasets of apple trees for object detection.
We train a YOLOv5m object detection model to predict apples in a real-world apple detection dataset.
Results demonstrate that the model trained on generated data is slightly underperforming compared to a baseline model trained on real-world images.
arXiv Detail & Related papers (2023-06-20T09:46:01Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - A Bayesian Generative Adversarial Network (GAN) to Generate Synthetic
Time-Series Data, Application in Combined Sewer Flow Prediction [3.3139597764446607]
In machine learning, generative models are a class of methods capable of learning data distribution to generate artificial data.
In this study, we developed a GAN model to generate synthetic time series to balance our limited recorded time series data.
The aim is to predict the flow using precipitation data and examine the impact of data augmentation using synthetic data in model performance.
arXiv Detail & Related papers (2023-01-31T16:12:26Z) - DynImp: Dynamic Imputation for Wearable Sensing Data Through Sensory and
Temporal Relatedness [78.98998551326812]
We argue that traditional methods have rarely made use of both times-series dynamics of the data as well as the relatedness of the features from different sensors.
We propose a model, termed as DynImp, to handle different time point's missingness with nearest neighbors along feature axis.
We show that the method can exploit the multi-modality features from related sensors and also learn from history time-series dynamics to reconstruct the data under extreme missingness.
arXiv Detail & Related papers (2022-09-26T21:59:14Z) - Convolutional generative adversarial imputation networks for
spatio-temporal missing data in storm surge simulations [86.5302150777089]
Generative Adversarial Imputation Nets (GANs) and GAN-based techniques have attracted attention as unsupervised machine learning methods.
We name our proposed method as Con Conval Generative Adversarial Imputation Nets (Conv-GAIN)
arXiv Detail & Related papers (2021-11-03T03:50:48Z) - Preprocessing and Modeling of Radial Fan Data for Health State
Prediction [0.0]
In vital machinery, a trend to exaggerated sensors may be noticed, both in quality and in quantity.
This paper focuses on the reduction of this data through downsampling and binning.
arXiv Detail & Related papers (2021-09-08T07:37:18Z) - Time Series Anomaly Detection with label-free Model Selection [0.6303112417588329]
We propose LaF-AD, a novel anomaly detection algorithm with label-free model selection for unlabeled times-series data.
Our algorithm is easily parallelizable, more robust for ill-conditioned and seasonal data, and highly scalable for a large number of anomaly models.
arXiv Detail & Related papers (2021-06-11T00:21:06Z) - Back2Future: Leveraging Backfill Dynamics for Improving Real-time
Predictions in Future [73.03458424369657]
In real-time forecasting in public health, data collection is a non-trivial and demanding task.
'Backfill' phenomenon and its effect on model performance has been barely studied in the prior literature.
We formulate a novel problem and neural framework Back2Future that aims to refine a given model's predictions in real-time.
arXiv Detail & Related papers (2021-06-08T14:48:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.