Correlation-wise Smoothing: Lightweight Knowledge Extraction for HPC
Monitoring Data
- URL: http://arxiv.org/abs/2010.06186v2
- Date: Fri, 19 Feb 2021 07:08:19 GMT
- Title: Correlation-wise Smoothing: Lightweight Knowledge Extraction for HPC
Monitoring Data
- Authors: Alessio Netti, Daniele Tafani, Michael Ott and Martin Schulz
- Abstract summary: We propose a novel method, called Correlation-wise Smoothing (CS), to extract descriptive signatures from time-series monitoring data.
Our method exploits correlations between data dimensions to form groups and produces image-like signatures that can be easily manipulated, visualized and compared.
We evaluate the CS method on HPC-ODA, a collection of datasets that we release with this work, and show that it leads to the same performance as most state-of-the-art methods.
- Score: 1.802439717192088
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern High-Performance Computing (HPC) and data center operators rely more
and more on data analytics techniques to improve the efficiency and reliability
of their operations. They employ models that ingest time-series monitoring
sensor data and transform it into actionable knowledge for system tuning: a
process known as Operational Data Analytics (ODA). However, monitoring data has
a high dimensionality, is hardware-dependent and difficult to interpret. This,
coupled with the strict requirements of ODA, makes most traditional data mining
methods impractical and in turn renders this type of data cumbersome to
process. Most current ODA solutions use ad-hoc processing methods that are not
generic, are sensible to the sensors' features and are not fit for
visualization.
In this paper we propose a novel method, called Correlation-wise Smoothing
(CS), to extract descriptive signatures from time-series monitoring data in a
generic and lightweight way. Our CS method exploits correlations between data
dimensions to form groups and produces image-like signatures that can be easily
manipulated, visualized and compared. We evaluate the CS method on HPC-ODA, a
collection of datasets that we release with this work, and show that it leads
to the same performance as most state-of-the-art methods while producing
signatures that are up to ten times smaller and up to ten times faster, while
gaining visualizability, portability across systems and clear scaling
properties.
Related papers
- KPIs-Based Clustering and Visualization of HPC jobs: a Feature Reduction
Approach [0.0]
HPC systems need to be constantly monitored to ensure their stability.
The monitoring systems collect a tremendous amount of data about different parameters or Key Performance Indicators (KPIs), such as resource usage, IO waiting time, etc.
A proper analysis of this data, usually stored as time series, can provide insight in choosing the right management strategies as well as the early detection of issues.
arXiv Detail & Related papers (2023-12-11T17:13:54Z) - ALMERIA: Boosting pairwise molecular contrasts with scalable methods [0.0]
ALMERIA is a tool for estimating compound similarities and activity prediction based on pairwise molecular contrasts.
It has been implemented using scalable software and methods to exploit large volumes of data.
Experiments show state-of-the-art performance for molecular activity prediction.
arXiv Detail & Related papers (2023-04-28T16:27:06Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks.
Data augmentation induces these symmetries during training by applying multiple transformations to the input data.
This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - GAN-Supervised Dense Visual Alignment [95.37027391102684]
We propose GAN-Supervised Learning, a framework for learning discriminative models and their GAN-generated training data jointly end-to-end.
Inspired by the classic Congealing method, our GANgealing algorithm trains a Spatial Transformer to map random samples from a GAN trained on unaligned data to a common, jointly-learned target mode.
arXiv Detail & Related papers (2021-12-09T18:59:58Z) - Data Fusion with Latent Map Gaussian Processes [0.0]
Multi-fidelity modeling and calibration are data fusion tasks that ubiquitously arise in engineering design.
We introduce a novel approach based on latent-map Gaussian processes (LMGPs) that enables efficient and accurate data fusion.
arXiv Detail & Related papers (2021-12-04T00:54:19Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Online Descriptor Enhancement via Self-Labelling Triplets for Visual
Data Association [28.03285334702022]
We propose a self-supervised method for incrementally refining visual descriptors to improve performance in the task of object-level visual data association.
Our method optimize deep descriptor generators online, by continuously training a widely available image classification network pre-trained with domain-independent data.
We show that our approach surpasses other visual data-association methods applied to a tracking-by-detection task, and show that it provides better performance-gains when compared to other methods that attempt to adapt to observed information.
arXiv Detail & Related papers (2020-11-06T17:42:04Z) - Supervised Visualization for Data Exploration [9.742277703732187]
We describe a novel supervised visualization technique based on random forest proximities and diffusion-based dimensionality reduction.
Our approach is robust to noise and parameter tuning, thus making it simple to use while producing reliable visualizations for data exploration.
arXiv Detail & Related papers (2020-06-15T19:10:17Z) - NCVis: Noise Contrastive Approach for Scalable Visualization [79.44177623781043]
NCVis is a high-performance dimensionality reduction method built on a sound statistical basis of noise contrastive estimation.
We show that NCVis outperforms state-of-the-art techniques in terms of speed while preserving the representation quality of other methods.
arXiv Detail & Related papers (2020-01-30T15:43:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.