Catch Me If You Can: Using Power Analysis to Identify HPC Activity
- URL: http://arxiv.org/abs/2005.03135v1
- Date: Wed, 6 May 2020 20:57:41 GMT
- Title: Catch Me If You Can: Using Power Analysis to Identify HPC Activity
- Authors: Bogdan Copos and Sean Peisert
- Abstract summary: We show how electrical power consumption data from an HPC platform can be used to identify what programs are executed.
We test our approach on an HPC rack at Lawrence Berkeley National Laboratory using a variety of scientific benchmarks.
- Score: 0.35534933448684125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monitoring users on large computing platforms such as high performance
computing (HPC) and cloud computing systems is non-trivial. Utilities such as
process viewers provide limited insight into what users are running, due to
granularity limitation, and other sources of data, such as system call tracing,
can impose significant operational overhead. However, despite technical and
procedural measures, instances of users abusing valuable HPC resources for
personal gains have been documented in the past \cite{hpcbitmine}, and systems
that are open to large numbers of loosely-verified users from around the world
are at risk of abuse. In this paper, we show how electrical power consumption
data from an HPC platform can be used to identify what programs are executed.
The intuition is that during execution, programs exhibit various patterns of
CPU and memory activity. These patterns are reflected in the power consumption
of the system and can be used to identify programs running. We test our
approach on an HPC rack at Lawrence Berkeley National Laboratory using a
variety of scientific benchmarks. Among other interesting observations, our
results show that by monitoring the power consumption of an HPC rack, it is
possible to identify if particular programs are running with precision up to
and recall of 95\% even in noisy scenarios.
Related papers
- Ensemble Method for System Failure Detection Using Large-Scale Telemetry Data [0.0]
This research paper presents an in-depth analysis of extensive system telemetry data, proposing an ensemble methodology for detecting system failures.
The proposed ensemble technique integrates a diverse set of algorithms, including Long Short-Term Memory (LSTM) networks, isolation forests, one-class support vector machines (OCSVM), and local outlier factors (LOF)
Experimental evaluations demonstrate the remarkable efficacy of our models, achieving a notable detection rate in identifying system failures.
arXiv Detail & Related papers (2024-06-07T06:35:17Z) - Time-Series Forecasting and Sequence Learning Using Memristor-based Reservoir System [2.6473021051027534]
We develop a memristor-based echo state network accelerator that features efficient temporal data processing and in-situ online learning.
The proposed design is benchmarked using various datasets involving real-world tasks, such as forecasting the load energy consumption and weather conditions.
It is observed that the system demonstrates reasonable robustness for device failure below 10%, which may occur due to stuck-at faults.
arXiv Detail & Related papers (2024-05-22T05:07:56Z) - Age-Based Scheduling for Mobile Edge Computing: A Deep Reinforcement
Learning Approach [58.911515417156174]
We propose a new definition of Age of Information (AoI) and, based on the redefined AoI, we formulate an online AoI problem for MEC systems.
We introduce Post-Decision States (PDSs) to exploit the partial knowledge of the system's dynamics.
We also combine PDSs with deep RL to further improve the algorithm's applicability, scalability, and robustness.
arXiv Detail & Related papers (2023-12-01T01:30:49Z) - WattScope: Non-intrusive Application-level Power Disaggregation in
Datacenters [0.6086160084025234]
WattScope is a system for non-intrusive estimating the power consumption of individual applications.
WattScope adapts and extends a machine learning-based technique for disaggregating building power.
arXiv Detail & Related papers (2023-09-22T04:13:46Z) - A Reinforcement Learning Approach for Performance-aware Reduction in
Power Consumption of Data Center Compute Nodes [0.46040036610482665]
We use Reinforcement Learning to design a power capping policy on cloud compute nodes.
We show how a trained agent running on actual hardware can take actions by balancing power consumption and application performance.
arXiv Detail & Related papers (2023-08-15T23:25:52Z) - Computationally Budgeted Continual Learning: What Does Matter? [128.0827987414154]
Continual Learning (CL) aims to sequentially train models on streams of incoming data that vary in distribution by preserving previous knowledge while adapting to new data.
Current CL literature focuses on restricted access to previously seen data, while imposing no constraints on the computational budget for training.
We revisit this problem with a large-scale benchmark and analyze the performance of traditional CL approaches in a compute-constrained setting.
arXiv Detail & Related papers (2023-03-20T14:50:27Z) - PCBDet: An Efficient Deep Neural Network Object Detection Architecture
for Automatic PCB Component Detection on the Edge [48.7576911714538]
PCBDet is an attention condenser network design that provides state-of-the-art inference throughput.
It achieves superior PCB component detection performance compared to other state-of-the-art efficient architecture designs.
arXiv Detail & Related papers (2023-01-23T04:34:25Z) - Understanding the Energy Consumption of HPC Scale Artificial
Intelligence [0.0]
This paper contributes towards better understanding the energy consumption trade-offs of HPC scale Artificial Intelligence (AI) and more specifically Deep Learning (DL) algorithms.
We developed benchmark-tracker, a benchmark tool to evaluate the speed and energy consumption of DL algorithms in HPC environments.
arXiv Detail & Related papers (2022-11-14T08:51:17Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Providing Meaningful Data Summarizations Using Examplar-based Clustering
in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms.
We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z) - Towards AIOps in Edge Computing Environments [60.27785717687999]
This paper describes the system design of an AIOps platform which is applicable in heterogeneous, distributed environments.
It is feasible to collect metrics with a high frequency and simultaneously run specific anomaly detection algorithms directly on edge devices.
arXiv Detail & Related papers (2021-02-12T09:33:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.