Sinan: Data-Driven, QoS-Aware Cluster Management for Microservices
- URL: http://arxiv.org/abs/2105.13424v1
- Date: Thu, 27 May 2021 19:57:51 GMT
- Title: Sinan: Data-Driven, QoS-Aware Cluster Management for Microservices
- Authors: Yanqi Zhang, Weizhe Hua, Zhuangzhuang Zhou, Edward Suh, Christina
Delimitrou
- Abstract summary: Sinan is a data-driven cluster manager for interactive cloud that is online and allocate-aware.
We present Sinan, a data-driven cluster manager for interactive cloud that is online and allocate-aware.
- Score: 3.6923632650826477
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cloud applications are increasingly shifting from large monolithic services,
to large numbers of loosely-coupled, specialized microservices. Despite their
advantages in terms of facilitating development, deployment, modularity, and
isolation, microservices complicate resource management, as dependencies
between them introduce backpressure effects and cascading QoS violations.
We present Sinan, a data-driven cluster manager for interactive cloud
microservices that is online and QoS-aware. Sinan leverages a set of scalable
and validated machine learning models to determine the performance impact of
dependencies between microservices, and allocate appropriate resources per tier
in a way that preserves the end-to-end tail latency target. We evaluate Sinan
both on dedicated local clusters and large-scale deployments on Google Compute
Engine (GCE) across representative end-to-end applications built with
microservices, such as social networks and hotel reservation sites. We show
that Sinan always meets QoS, while also maintaining cluster utilization high,
in contrast to prior work which leads to unpredictable performance or
sacrifices resource efficiency. Furthermore, the techniques in Sinan are
explainable, meaning that cloud operators can yield insights from the ML models
on how to better deploy and design their applications to reduce unpredictable
performance.
Related papers
- MAS-ZERO: Designing Multi-Agent Systems with Zero Supervision [76.42361936804313]
We introduce MAS-ZERO, the first self-evolved, inference-time framework for automatic MAS design.<n> MAS-ZERO employs meta-level design to iteratively generate, evaluate, and refine MAS configurations tailored to each problem instance.
arXiv Detail & Related papers (2025-05-21T00:56:09Z) - Design and Evaluation of a Microservices Cloud Framework for Online Travel Platforms [1.03590082373586]
This paper analyses and integrates a unique Microservices Cloud Framework designed to support Online Travel Platforms (MCF-OTP)<n>MCF-OTPs main goal is to increase the performance, flexibility, and maintenance of online travel platforms via cloud computing and microservice technologies.
arXiv Detail & Related papers (2025-05-20T15:36:55Z) - PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing [48.30406812516552]
We introduce the PLM, a Peripheral Language Model, developed through a co-design process that jointly optimize model architecture and edge system constraints.
PLM employs a Multi-head Latent Attention mechanism and employs the squared ReLU activation function to encourage sparsity, thereby reducing peak memory footprint.
evaluation results demonstrate that PLM outperforms existing small language models trained on publicly available data.
arXiv Detail & Related papers (2025-03-15T15:11:17Z) - Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use.
MeCo captures high-level cognitive signals in the representation space, guiding when to invoke tools.
Our experiments show that MeCo accurately detects LLMs' internal cognitive signals and significantly improves tool-use decision-making.
arXiv Detail & Related papers (2025-02-18T15:45:01Z) - Microservice Deployment in Space Computing Power Networks via Robust Reinforcement Learning [43.96374556275842]
It is important to provide reliable real-time remote sensing inference services to meet the low-latency requirements.
This paper presents a remote sensing artificial intelligence applications deployment framework designed for Low Earth Orbit satellite constellations.
arXiv Detail & Related papers (2025-01-08T16:55:04Z) - Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design [59.00758127310582]
We propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models.
Our approach employs activation sparsity to extract experts.
Read-ME outperforms other popular open-source dense models of similar scales.
arXiv Detail & Related papers (2024-10-24T19:48:51Z) - SeBS-Flow: Benchmarking Serverless Cloud Function Workflows [51.4200085836966]
We propose the first serverless workflow benchmarking suite SeBS-Flow.
SeBS-Flow includes six real-world application benchmarks and four microbenchmarks representing different computational patterns.
We conduct comprehensive evaluations on three major cloud platforms, assessing performance, cost, scalability, and runtime deviations.
arXiv Detail & Related papers (2024-10-04T14:52:18Z) - Leveraging Interpretability in the Transformer to Automate the Proactive Scaling of Cloud Resources [1.1470070927586018]
We develop a model that captures the relationship between an end-to-end latency, requests at the front-end level, and resource utilization.
We then use the developed model to predict the end-to-end latency.
We demonstrate the merit of a microservice-based application and provide a roadmap to deployment.
arXiv Detail & Related papers (2024-09-04T22:03:07Z) - DeepScaler: Holistic Autoscaling for Microservices Based on
Spatiotemporal GNN with Adaptive Graph Learning [4.128665560397244]
This paper presents DeepScaler, a deep learning-based holistic autoscaling approach.
It focuses on coping with service dependencies to optimize service-level agreements (SLA) assurance and cost efficiency.
Experimental results demonstrate that our method implements a more effective autoscaling mechanism for microservice.
arXiv Detail & Related papers (2023-09-02T08:22:21Z) - Alioth: A Machine Learning Based Interference-Aware Performance Monitor
for Multi-Tenancy Applications in Public Cloud [15.942285615596566]
Multi-tenancy in public clouds may lead to co-location interference on shared resources, which possibly results in performance degradation.
We propose a novel machine learning framework, Alioth, to monitor the performance degradation of cloud applications.
Alioth achieves an average mean absolute error of 5.29% offline and 10.8% when testing on applications unseen in the training stage.
arXiv Detail & Related papers (2023-07-18T03:34:33Z) - Predicting Resource Consumption of Kubernetes Container Systems using
Resource Models [3.138731415322007]
This paper considers how to derive resource models for cloud systems empirically.
We do so based on models of deployed services in a formal language with explicit adherence to CPU and memory resources.
We report on leveraging data collected empirically from small deployments to simulate the execution of higher intensity scenarios on larger deployments.
arXiv Detail & Related papers (2023-05-12T17:59:01Z) - Benchmarking scalability of stream processing frameworks deployed as
microservices in the cloud [0.38073142980732994]
We benchmark five modern stream processing frameworks regarding their scalability using a systematic method.
All benchmarked frameworks exhibit approximately linear scalability as long as sufficient cloud resources are provisioned.
There is no clear superior framework, but the ranking of the frameworks on the use case.
arXiv Detail & Related papers (2023-03-20T13:22:03Z) - Neural Attentive Circuits [93.95502541529115]
We introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs)
NACs learn the parameterization and a sparse connectivity of neural modules without using domain knowledge.
NACs achieve an 8x speedup at inference time while losing less than 3% performance.
arXiv Detail & Related papers (2022-10-14T18:00:07Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Federated Learning with Unreliable Clients: Performance Analysis and
Mechanism Design [76.29738151117583]
Federated Learning (FL) has become a promising tool for training effective machine learning models among distributed clients.
However, low quality models could be uploaded to the aggregator server by unreliable clients, leading to a degradation or even a collapse of training.
We model these unreliable behaviors of clients and propose a defensive mechanism to mitigate such a security risk.
arXiv Detail & Related papers (2021-05-10T08:02:27Z) - A Privacy-Preserving Distributed Architecture for
Deep-Learning-as-a-Service [68.84245063902908]
This paper introduces a novel distributed architecture for deep-learning-as-a-service.
It is able to preserve the user sensitive data while providing Cloud-based machine and deep learning services.
arXiv Detail & Related papers (2020-03-30T15:12:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.