Lightweight Automated Feature Monitoring for Data Streams
- URL: http://arxiv.org/abs/2207.08640v2
- Date: Tue, 19 Jul 2022 11:01:17 GMT
- Title: Lightweight Automated Feature Monitoring for Data Streams
- Authors: Jo\~ao Conde, Ricardo Moreira, Jo\~ao Torres, Pedro Cardoso, Hugo R.C.
Ferreira, Marco O.P. Sampaio, Jo\~ao Tiago Ascens\~ao, Pedro Bizarro
- Abstract summary: We propose a flexible system, Feature Monitoring (FM), that detects data drifts in such data sets.
It monitors all features that are used by the system, while providing an interpretable features ranking whenever an alarm occurs.
This illustrates how FM eliminates the need to add custom signals to detect specific types of problems and that monitoring the available space of features is often enough.
- Score: 1.4658400971135652
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Monitoring the behavior of automated real-time stream processing systems has
become one of the most relevant problems in real world applications. Such
systems have grown in complexity relying heavily on high dimensional input
data, and data hungry Machine Learning (ML) algorithms. We propose a flexible
system, Feature Monitoring (FM), that detects data drifts in such data sets,
with a small and constant memory footprint and a small computational cost in
streaming applications. The method is based on a multi-variate statistical test
and is data driven by design (full reference distributions are estimated from
the data). It monitors all features that are used by the system, while
providing an interpretable features ranking whenever an alarm occurs (to aid in
root cause analysis). The computational and memory lightness of the system
results from the use of Exponential Moving Histograms. In our experimental
study, we analyze the system's behavior with its parameters and, more
importantly, show examples where it detects problems that are not directly
related to a single feature. This illustrates how FM eliminates the need to add
custom signals to detect specific types of problems and that monitoring the
available space of features is often enough.
Related papers
- The Causal Chambers: Real Physical Systems as a Testbed for AI Methodology [10.81691411087626]
In some fields of AI, machine learning and statistics, the validation of new methods and algorithms is often hindered by the scarcity of suitable real-world datasets.
We have constructed two devices that allow us to quickly and inexpensively produce large datasets from non-trivial but well-understood physical systems.
arXiv Detail & Related papers (2024-04-17T13:00:52Z) - A Multi-Level, Multi-Scale Visual Analytics Approach to Assessment of
Multifidelity HPC Systems [17.246865176910045]
Hardware system events and behaviors are crucial to improving the robustness and reliability of these systems.
In this work, we aim to build a holistic analytical system that helps make sense of such massive data.
This end-to-end log analysis system, coupled with visual analytics support, allows users to glean and promptly extract supercomputer usage and error patterns.
arXiv Detail & Related papers (2023-06-15T19:23:50Z) - A hybrid feature learning approach based on convolutional kernels for
ATM fault prediction using event-log data [5.859431341476405]
We present a predictive model based on a convolutional kernel (MiniROCKET and HYDRA) to extract features from event-log data.
The proposed methodology is applied to a significant real-world collected dataset.
The model was integrated into a container-based decision support system to support operators in the timely maintenance of ATMs.
arXiv Detail & Related papers (2023-05-17T08:55:53Z) - Interactive System-wise Anomaly Detection [66.3766756452743]
Anomaly detection plays a fundamental role in various applications.
It is challenging for existing methods to handle the scenarios where the instances are systems whose characteristics are not readily observed as data.
We develop an end-to-end approach which includes an encoder-decoder module that learns system embeddings.
arXiv Detail & Related papers (2023-04-21T02:20:24Z) - Heterogeneous Anomaly Detection for Software Systems via Semi-supervised
Cross-modal Attention [29.654681594903114]
We propose Hades, the first end-to-end semi-supervised approach to identify system anomalies based on heterogeneous data.
Our approach employs a hierarchical architecture to learn a global representation of the system status by fusing log semantics and metric patterns.
We evaluate Hades extensively on large-scale simulated data and datasets from Huawei Cloud.
arXiv Detail & Related papers (2023-02-14T09:02:11Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - A Robust and Explainable Data-Driven Anomaly Detection Approach For
Power Electronics [56.86150790999639]
We present two anomaly detection and classification approaches, namely the Matrix Profile algorithm and anomaly transformer.
The Matrix Profile algorithm is shown to be well suited as a generalizable approach for detecting real-time anomalies in streaming time-series data.
A series of custom filters is created and added to the detector to tune its sensitivity, recall, and detection accuracy.
arXiv Detail & Related papers (2022-09-23T06:09:35Z) - Using sequential drift detection to test the API economy [4.056434158960926]
API economy refers to the widespread integration of API (advanced programming interface)
It is desirable to monitor the usage patterns and identify when the system is used in a way that was never used before.
In this work we analyze both histograms and call graph of API usage to determine if the usage patterns of the system has shifted.
arXiv Detail & Related papers (2021-11-09T13:24:19Z) - TELESTO: A Graph Neural Network Model for Anomaly Classification in
Cloud Services [77.454688257702]
Machine learning (ML) and artificial intelligence (AI) are applied on IT system operation and maintenance.
One direction aims at the recognition of re-occurring anomaly types to enable remediation automation.
We propose a method that is invariant to dimensionality changes of given data.
arXiv Detail & Related papers (2021-02-25T14:24:49Z) - Self-Attentive Classification-Based Anomaly Detection in Unstructured
Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations.
We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z) - PyODDS: An End-to-end Outlier Detection System with Automated Machine
Learning [55.32009000204512]
We present PyODDS, an automated end-to-end Python system for Outlier Detection with Database Support.
Specifically, we define the search space in the outlier detection pipeline, and produce a search strategy within the given search space.
It also provides unified interfaces and visualizations for users with or without data science or machine learning background.
arXiv Detail & Related papers (2020-03-12T03:30:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.