KPIs-Based Clustering and Visualization of HPC jobs: a Feature Reduction
Approach
- URL: http://arxiv.org/abs/2312.06534v1
- Date: Mon, 11 Dec 2023 17:13:54 GMT
- Title: KPIs-Based Clustering and Visualization of HPC jobs: a Feature Reduction
Approach
- Authors: Mohamed Soliman Halawa and Rebeca P. D\'iaz-Redondo and Ana
Fern\'andez-Vilas
- Abstract summary: HPC systems need to be constantly monitored to ensure their stability.
The monitoring systems collect a tremendous amount of data about different parameters or Key Performance Indicators (KPIs), such as resource usage, IO waiting time, etc.
A proper analysis of this data, usually stored as time series, can provide insight in choosing the right management strategies as well as the early detection of issues.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-Performance Computing (HPC) systems need to be constantly monitored to
ensure their stability. The monitoring systems collect a tremendous amount of
data about different parameters or Key Performance Indicators (KPIs), such as
resource usage, IO waiting time, etc. A proper analysis of this data, usually
stored as time series, can provide insight in choosing the right management
strategies as well as the early detection of issues. In this paper, we
introduce a methodology to cluster HPC jobs according to their KPI indicators.
Our approach reduces the inherent high dimensionality of the collected data by
applying two techniques to the time series: literature-based and variance-based
feature extraction. We also define a procedure to visualize the obtained
clusters by combining the two previous approaches and the Principal Component
Analysis (PCA). Finally, we have validated our contributions on a real data set
to conclude that those KPIs related to CPU usage provide the best cohesion and
separation for clustering analysis and the good results of our visualization
methodology.
Related papers
- Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers [0.0]
Key Performance Indicators (KPIs) generate a huge number of monitoring tasks that give data about CPU usage, memory usage, network traffic, or other sensors that monitor hardware.
The main contribution in this paper is to identify which metric/s (KPIs) is/are the most appropriate to identify/classify different types of jobs according to their behavior in the HPC system.
We have concluded that (i. those metrics (KPIs) related to the Network (interface) traffic monitoring provide the best cohesion and separation to cluster HPC jobs, and (ii. hierarchical clustering algorithms are the most suitable for this task
arXiv Detail & Related papers (2023-12-11T17:31:46Z) - QBSD: Quartile-Based Seasonality Decomposition for Cost-Effective RAN KPI Forecasting [0.18416014644193066]
We introduce QBSD, a live single-step forecasting approach tailored to optimize the trade-off between accuracy and computational complexity.
QBSD has shown significant success with our real network RAN datasets of over several thousand cells.
Results demonstrate that the proposed method excels in runtime efficiency compared to the leading algorithms available.
arXiv Detail & Related papers (2023-06-09T15:59:27Z) - Towards High-Performance Exploratory Data Analysis (EDA) Via Stable
Equilibrium Point [5.825190876052149]
We introduce a stable equilibrium point (SEP) - based framework for improving the efficiency and solution quality of EDA.
A very unique property of the proposed method is that the SEPs will directly encode the clustering properties of data sets.
arXiv Detail & Related papers (2023-06-07T13:31:57Z) - Federated Stochastic Gradient Descent Begets Self-Induced Momentum [151.4322255230084]
Federated learning (FL) is an emerging machine learning method that can be applied in mobile edge systems.
We show that running to the gradient descent (SGD) in such a setting can be viewed as adding a momentum-like term to the global aggregation process.
arXiv Detail & Related papers (2022-02-17T02:01:37Z) - Reinforcement Learning with Heterogeneous Data: Estimation and Inference [84.72174994749305]
We introduce the K-Heterogeneous Markov Decision Process (K-Hetero MDP) to address sequential decision problems with population heterogeneity.
We propose the Auto-Clustered Policy Evaluation (ACPE) for estimating the value of a given policy, and the Auto-Clustered Policy Iteration (ACPI) for estimating the optimal policy in a given policy class.
We present simulations to support our theoretical findings, and we conduct an empirical study on the standard MIMIC-III dataset.
arXiv Detail & Related papers (2022-01-31T20:58:47Z) - Spatial-Spectral Clustering with Anchor Graph for Hyperspectral Image [88.60285937702304]
This paper proposes a novel unsupervised approach called spatial-spectral clustering with anchor graph (SSCAG) for HSI data clustering.
The proposed SSCAG is competitive against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-24T08:09:27Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Cross-Gradient Aggregation for Decentralized Learning from Non-IID data [34.23789472226752]
Decentralized learning enables a group of collaborative agents to learn models using a distributed dataset without the need for a central parameter server.
We propose Cross-Gradient Aggregation (CGA), a novel decentralized learning algorithm.
We show superior learning performance of CGA over existing state-of-the-art decentralized learning algorithms.
arXiv Detail & Related papers (2021-03-02T21:58:12Z) - Correlation-wise Smoothing: Lightweight Knowledge Extraction for HPC
Monitoring Data [1.802439717192088]
We propose a novel method, called Correlation-wise Smoothing (CS), to extract descriptive signatures from time-series monitoring data.
Our method exploits correlations between data dimensions to form groups and produces image-like signatures that can be easily manipulated, visualized and compared.
We evaluate the CS method on HPC-ODA, a collection of datasets that we release with this work, and show that it leads to the same performance as most state-of-the-art methods.
arXiv Detail & Related papers (2020-10-13T05:22:47Z) - Topology-based Clusterwise Regression for User Segmentation and Demand
Forecasting [63.78344280962136]
Using a public and a novel proprietary data set of commercial data, this research shows that the proposed system enables analysts to both cluster their user base and plan demand at a granular level.
This work seeks to introduce TDA-based clustering of time series and clusterwise regression with matrix factorization methods as viable tools for the practitioner.
arXiv Detail & Related papers (2020-09-08T12:10:10Z) - Superiority of Simplicity: A Lightweight Model for Network Device
Workload Prediction [58.98112070128482]
We propose a lightweight solution for series prediction based on historic observations.
It consists of a heterogeneous ensemble method composed of two models - a neural network and a mean predictor.
It achieves an overall $R2$ score of 0.10 on the available FedCSIS 2020 challenge dataset.
arXiv Detail & Related papers (2020-07-07T15:44:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.