Related papers: Alioth: A Machine Learning Based Interference-Aware Performance Monitor for Multi-Tenancy Applications in Public Cloud

Alioth: A Machine Learning Based Interference-Aware Performance Monitor for Multi-Tenancy Applications in Public Cloud

URL: http://arxiv.org/abs/2307.08949v1
Date: Tue, 18 Jul 2023 03:34:33 GMT
Title: Alioth: A Machine Learning Based Interference-Aware Performance Monitor for Multi-Tenancy Applications in Public Cloud
Authors: Tianyao Shi, Yingxuan Yang, Yunlong Cheng, Xiaofeng Gao, Zhen Fang, Yongqiang Yang
Abstract summary: Multi-tenancy in public clouds may lead to co-location interference on shared resources, which possibly results in performance degradation. We propose a novel machine learning framework, Alioth, to monitor the performance degradation of cloud applications. Alioth achieves an average mean absolute error of 5.29% offline and 10.8% when testing on applications unseen in the training stage.
Score: 15.942285615596566
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-tenancy in public clouds may lead to co-location interference on shared resources, which possibly results in performance degradation of cloud applications. Cloud providers want to know when such events happen and how serious the degradation is, to perform interference-aware migrations and alleviate the problem. However, virtual machines (VM) in Infrastructure-as-a-Service public clouds are black-boxes to providers, where application-level performance information cannot be acquired. This makes performance monitoring intensely challenging as cloud providers can only rely on low-level metrics such as CPU usage and hardware counters. We propose a novel machine learning framework, Alioth, to monitor the performance degradation of cloud applications. To feed the data-hungry models, we first elaborate interference generators and conduct comprehensive co-location experiments on a testbed to build Alioth-dataset which reflects the complexity and dynamicity in real-world scenarios. Then we construct Alioth by (1) augmenting features via recovering low-level metrics under no interference using denoising auto-encoders, (2) devising a transfer learning model based on domain adaptation neural network to make models generalize on test cases unseen in offline training, and (3) developing a SHAP explainer to automate feature selection and enhance model interpretability. Experiments show that Alioth achieves an average mean absolute error of 5.29% offline and 10.8% when testing on applications unseen in the training stage, outperforming the baseline methods. Alioth is also robust in signaling quality-of-service violation under dynamicity. Finally, we demonstrate a possible application of Alioth's interpretability, providing insights to benefit the decision-making of cloud operators. The dataset and code of Alioth have been released on GitHub.

Related papers

VAE-based Feature Disentanglement for Data Augmentation and Compression in Generalized GNSS Interference Classification [42.14439854721613]
We propose variational autoencoders (VAEs) for disentanglement to extract essential latent features that enable accurate classification of interferences. Our proposed VAE achieves a data compression rate ranging from 512 to 8,192 and achieves an accuracy up to 99.92%.
arXiv Detail & Related papers (2025-04-14T13:38:00Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets. We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z)
Effective Intrusion Detection in Heterogeneous Internet-of-Things Networks via Ensemble Knowledge Distillation-based Federated Learning [52.6706505729803]
We introduce Federated Learning (FL) to collaboratively train a decentralized shared model of Intrusion Detection Systems (IDS) FLEKD enables a more flexible aggregation method than conventional model fusion techniques. Experiment results show that the proposed approach outperforms local training and traditional FL in terms of both speed and performance.
arXiv Detail & Related papers (2024-01-22T14:16:37Z)
Benchmarking Function Hook Latency in Cloud-Native Environments [0.5188841610098435]
Cloud-native applications are often instrumented or altered at runtime, by dynamically patching or hooking them, which introduces a significant performance overhead. We present recommendations to mitigate these risks and demonstrate how an improper experimental setup can negatively impact latency measurements.
arXiv Detail & Related papers (2023-10-19T12:54:32Z)
Nebula: Self-Attention for Dynamic Malware Analysis [14.710331873072146]
We introduce Nebula, a versatile, self-attention Transformer-based neural architecture that generalizes across different behavioral representations and formats. We perform experiments on both malware detection and classification tasks, using three datasets acquired from different dynamic analyses platforms. We showcase how self-supervised learning pre-training matches the performance of fully-supervised models with only 20% of training data.
arXiv Detail & Related papers (2023-09-19T09:24:36Z)
Cloud-Device Collaborative Adaptation to Continual Changing Environments in the Real-world [20.547119604004774]
We propose a new learning paradigm of Cloud-Device Collaborative Continual Adaptation, which encourages collaboration between cloud and device. We also propose an Uncertainty-based Visual Prompt Adapted (U-VPA) teacher-student model to transfer the generalization capability of the large model on the cloud to the device model. Our proposed U-VPA teacher-student framework outperforms previous state-of-the-art test time adaptation and device-cloud collaboration methods.
arXiv Detail & Related papers (2022-12-02T05:02:36Z)
MetaNetwork: A Task-agnostic Network Parameters Generation Framework for Improving Device Model Generalization [65.02542875281233]
We propose a novel task-agnostic framework, named MetaNetwork, for generating adaptive device model parameters from cloud without on-device training. The MetaGenerator is designed to learn a mapping function from samples to model parameters, and it can generate and deliver the adaptive parameters to the device based on samples uploaded from the device to the cloud. The MetaStabilizer aims to reduce the oscillation of the MetaGenerator, accelerate the convergence and improve the model performance during both training and inference.
arXiv Detail & Related papers (2022-09-12T13:26:26Z)
ESAI: Efficient Split Artificial Intelligence via Early Exiting Using Neural Architecture Search [6.316693022958222]
Deep neural networks have been outperforming conventional machine learning algorithms in many computer vision-related tasks. The majority of devices are harnessing the cloud computing methodology in which outstanding deep learning models are responsible for analyzing the data on the server. In this paper, a new framework for deploying on IoT devices has been proposed which can take advantage of both the cloud and the on-device models.
arXiv Detail & Related papers (2021-06-21T04:47:53Z)
Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference. We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space. Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z)
Towards AIOps in Edge Computing Environments [60.27785717687999]
This paper describes the system design of an AIOps platform which is applicable in heterogeneous, distributed environments. It is feasible to collect metrics with a high frequency and simultaneously run specific anomaly detection algorithms directly on edge devices.
arXiv Detail & Related papers (2021-02-12T09:33:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.