Ditto: Elastic Confidential VMs with Secure and Dynamic CPU Scaling
- URL: http://arxiv.org/abs/2409.15542v1
- Date: Mon, 23 Sep 2024 20:52:10 GMT
- Title: Ditto: Elastic Confidential VMs with Secure and Dynamic CPU Scaling
- Authors: Shixuan Zhao, Mengyuan Li, Mengjia Yan, Zhiqiang Lin,
- Abstract summary: "Elastic CVM" and the Worker vCPU design pave the way for more flexible and cost-effective confidential computing environments.
"Elastic CVM" and the Worker vCPU design not only optimize cloud resource utilization but also pave the way for more flexible and cost-effective confidential computing environments.
- Score: 35.971391128345125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Confidential Virtual Machines (CVMs) are a type of VMbased Trusted Execution Environments (TEEs) designed to enhance the security of cloud-based VMs, safeguarding them even from malicious hypervisors. Although CVMs have been widely adopted by major cloud service providers, current CVM designs face significant challenges in runtime resource management due to their fixed capacities and lack of transparency. These limitations hamper efficient cloud resource management, leading to increased operational costs and reduced agility in responding to fluctuating workloads. This paper introduces a dynamic CPU resource management approach, featuring the novel concept of "Elastic CVM. This approach allows for hypervisor-assisted runtime adjustment of CPU resources using a specialized vCPU type, termed Worker vCPU. This new approach enhances CPU resource adaptability and operational efficiency without compromising security. Additionally, we introduce a Worker vCPU Abstraction Layer to simplify Worker vCPU deployment and management. To demonstrate the effectiveness of our approach, we have designed and implemented a serverless computing prototype platform, called Ditto. We show that Ditto significantly improves performance and efficiency through finergrain resource management. The concept of "Elastic CVM" and the Worker vCPU design not only optimize cloud resource utilization but also pave the way for more flexible and cost-effective confidential computing environments.
Related papers
- DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution [114.61347672265076]
Development of MLLMs for real-world robots is challenging due to the typically limited computation and memory capacities available on robotic platforms.
We propose a Dynamic Early-Exit Framework for Robotic Vision-Language-Action Model (DeeR) that automatically adjusts the size of the activated MLLM.
DeeR demonstrates significant reductions in computational costs of LLM by 5.2-6.5x and GPU memory of LLM by 2-6x without compromising performance.
arXiv Detail & Related papers (2024-11-04T18:26:08Z) - Cost-Aware Dynamic Cloud Workflow Scheduling using Self-Attention and Evolutionary Reinforcement Learning [7.653021685451039]
This paper proposes a novel self-attention policy network for cloud workflow scheduling.
The trained SPN-CWS can effectively process all candidate instances simultaneously to identify the most suitable VM instance to execute every workflow task.
Our method can noticeably outperform several state-of-the-art algorithms on multiple benchmark CDMWS problems.
arXiv Detail & Related papers (2024-09-27T04:45:06Z) - vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving [53.972175896814505]
Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests.
Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests.
arXiv Detail & Related papers (2024-07-22T14:37:58Z) - Bridge the Future: High-Performance Networks in Confidential VMs without Trusted I/O devices [9.554247218443939]
Trusted I/O (TIO) is an appealing solution to improve I/O performance for confidential impact (CVMs)
This paper emphasizes that not all types of I/O can derive substantial benefits from TIO, particularly network I/O.
We present FOlio, a software solution crafted from a secure and efficient Data Plane Development Kit (DPDK) extension.
arXiv Detail & Related papers (2024-03-05T23:06:34Z) - FusionAI: Decentralized Training and Deploying LLMs with Massive
Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU.
This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - A smart resource management mechanism with trust access control for
cloud computing environment [3.3504365823045044]
This article suggests a conceptual framework for a workload management paradigm in cloud settings that is both safe and performance-efficient.
A resource management unit is used in this paradigm for energy and performing virtual machine allocation with efficiency.
A secure virtual machine management unit controls the resource management unit and is created to produce data on unlawful access or intercommunication.
arXiv Detail & Related papers (2022-12-10T15:00:58Z) - Walle: An End-to-End, General-Purpose, and Large-Scale Production System
for Device-Cloud Collaborative Machine Learning [40.09527159285327]
We build the first end-to-end and general-purpose system, called Walle, for device-cloud collaborative machine learning (ML)
Walle consists of a deployment platform, distributing ML tasks to billion-scale devices in time; a data pipeline, efficiently preparing task input; and a compute container, providing a cross-platform and high-performance execution environment.
We evaluate Walle in practical e-commerce application scenarios to demonstrate its effectiveness, efficiency, and scalability.
arXiv Detail & Related papers (2022-05-30T03:43:35Z) - Combination of Convolutional Neural Network and Gated Recurrent Unit for
Energy Aware Resource Allocation [0.0]
Cloud computing service models have experienced rapid growth and inefficient resource usage is one of the greatest causes of high energy consumption in cloud data centers.
Resource allocation in cloud data centers aiming to reduce energy consumption has been conducted using live migration of Virtual Machines (VMs) and their consolidation into the small number of Physical Machines (PMs)
To solve this issue, can be classified according to the pattern of user requests into sensitive or insensitive classes to latency, and thereafter suitable VM can be selected for migration.
arXiv Detail & Related papers (2021-06-23T05:57:51Z) - Optimizing Deep Learning Recommender Systems' Training On CPU Cluster
Architectures [56.69373580921888]
We focus on Recommender Systems which account for most of the AI cycles in cloud computing centers.
By enabling it to run on latest CPU hardware and software tailored for HPC, we are able to achieve more than two-orders of magnitude improvement in performance.
arXiv Detail & Related papers (2020-05-10T14:40:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.