Prism: Revealing Hidden Functional Clusters from Massive Instances in
Cloud Systems
- URL: http://arxiv.org/abs/2308.07638v1
- Date: Tue, 15 Aug 2023 08:34:54 GMT
- Title: Prism: Revealing Hidden Functional Clusters from Massive Instances in
Cloud Systems
- Authors: Jinyang Liu, Zhihan Jiang, Jiazhen Gu, Junjie Huang, Zhuangbin Chen,
Cong Feng, Zengyin Yang, Yongqiang Yang and Michael R. Lyu
- Abstract summary: We propose to infer functional clusters of instances, i.e., groups of instances having similar functionalities.
We first conduct a pilot study on a large-scale cloud system, Huawei Cloud, demonstrating that instances having similar functionalities share similar communication and resource usage patterns.
Motivated by these findings, we formulate the identification of functional clusters as a clustering problem and propose a non-intrusive solution called Prism.
- Score: 32.18320298895805
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Ensuring the reliability of cloud systems is critical for both cloud vendors
and customers. Cloud systems often rely on virtualization techniques to create
instances of hardware resources, such as virtual machines. However,
virtualization hinders the observability of cloud systems, making it
challenging to diagnose platform-level issues. To improve system observability,
we propose to infer functional clusters of instances, i.e., groups of instances
having similar functionalities. We first conduct a pilot study on a large-scale
cloud system, i.e., Huawei Cloud, demonstrating that instances having similar
functionalities share similar communication and resource usage patterns.
Motivated by these findings, we formulate the identification of functional
clusters as a clustering problem and propose a non-intrusive solution called
Prism. Prism adopts a coarse-to-fine clustering strategy. It first partitions
instances into coarse-grained chunks based on communication patterns. Within
each chunk, Prism further groups instances with similar resource usage patterns
to produce fine-grained functional clusters. Such a design reduces noises in
the data and allows Prism to process massive instances efficiently. We evaluate
Prism on two datasets collected from the real-world production environment of
Huawei Cloud. Our experiments show that Prism achieves a v-measure of ~0.95,
surpassing existing state-of-the-art solutions. Additionally, we illustrate the
integration of Prism within monitoring systems for enhanced cloud reliability
through two real-world use cases.
Related papers
- Point Cloud Compression with Implicit Neural Representations: A Unified Framework [54.119415852585306]
We present a pioneering point cloud compression framework capable of handling both geometry and attribute components.
Our framework utilizes two coordinate-based neural networks to implicitly represent a voxelized point cloud.
Our method exhibits high universality when contrasted with existing learning-based techniques.
arXiv Detail & Related papers (2024-05-19T09:19:40Z) - Alioth: A Machine Learning Based Interference-Aware Performance Monitor
for Multi-Tenancy Applications in Public Cloud [15.942285615596566]
Multi-tenancy in public clouds may lead to co-location interference on shared resources, which possibly results in performance degradation.
We propose a novel machine learning framework, Alioth, to monitor the performance degradation of cloud applications.
Alioth achieves an average mean absolute error of 5.29% offline and 10.8% when testing on applications unseen in the training stage.
arXiv Detail & Related papers (2023-07-18T03:34:33Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - Cloud-Device Collaborative Adaptation to Continual Changing Environments
in the Real-world [20.547119604004774]
We propose a new learning paradigm of Cloud-Device Collaborative Continual Adaptation, which encourages collaboration between cloud and device.
We also propose an Uncertainty-based Visual Prompt Adapted (U-VPA) teacher-student model to transfer the generalization capability of the large model on the cloud to the device model.
Our proposed U-VPA teacher-student framework outperforms previous state-of-the-art test time adaptation and device-cloud collaboration methods.
arXiv Detail & Related papers (2022-12-02T05:02:36Z) - Reproducible and Portable Big Data Analytics in the Cloud [4.948702463455218]
There are two main difficulties in reproducing big data applications in the cloud.
The first is how to automate end-to-end execution of big data analytics in the cloud.
The second is an application developed for one cloud, such as AWS or Azure, is difficult to reproduce in another cloud.
arXiv Detail & Related papers (2021-12-17T20:52:03Z) - Smoothed Multi-View Subspace Clustering [14.77544837600836]
We propose a novel multi-view clustering method named smoothed multi-view subspace clustering (SMVSC)
It employs a novel technique, i.e., graph filtering, to obtain a smooth representation for each view.
Experiments on benchmark datasets validate the superiority of our approach.
arXiv Detail & Related papers (2021-06-18T02:24:19Z) - A Deep Learning Object Detection Method for an Efficient Clusters
Initialization [6.365889364810239]
Clustering has been used in numerous applications such as banking customers profiling, document retrieval, image segmentation, and e-commerce recommendation engines.
Existing clustering techniques present significant limitations, from which is the dependability of their stability on the initialization parameters.
This paper proposes a solution that can provide near-optimal clustering parameters with low computational and resources overhead.
arXiv Detail & Related papers (2021-04-28T08:34:25Z) - Device-Cloud Collaborative Learning for Recommendation [50.01289274123047]
We propose a novel MetaPatch learning approach on the device side to efficiently achieve "thousands of people with thousands of models" given a centralized cloud model.
With billions of updated personalized device models, we propose a "model-over-models" distillation algorithm, namely MoMoDistill, to update the centralized cloud model.
arXiv Detail & Related papers (2021-04-14T05:06:59Z) - A Privacy-Preserving Distributed Architecture for
Deep-Learning-as-a-Service [68.84245063902908]
This paper introduces a novel distributed architecture for deep-learning-as-a-service.
It is able to preserve the user sensitive data while providing Cloud-based machine and deep learning services.
arXiv Detail & Related papers (2020-03-30T15:12:03Z) - ClusterVO: Clustering Moving Instances and Estimating Visual Odometry
for Self and Surroundings [54.33327082243022]
ClusterVO is a stereo Visual Odometry which simultaneously clusters and estimates the motion of both ego and surrounding rigid clusters/objects.
Unlike previous solutions relying on batch input or imposing priors on scene structure or dynamic object models, ClusterVO is online, general and thus can be used in various scenarios including indoor scene understanding and autonomous driving.
arXiv Detail & Related papers (2020-03-29T09:06:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.