Related papers: CloudFormer: An Attention-based Performance Prediction for Public Clouds with Unknown Workload

CloudFormer: An Attention-based Performance Prediction for Public Clouds with Unknown Workload

URL: http://arxiv.org/abs/2509.03394v1
Date: Wed, 03 Sep 2025 15:15:44 GMT
Title: CloudFormer: An Attention-based Performance Prediction for Public Clouds with Unknown Workload
Authors: Amirhossein Shahbazinia, Darong Huang, Luis Costero, David Atienza,
Abstract summary: Cloud platforms are increasingly relied upon to host diverse, resource-intensive workloads.<n>Existing management techniques, including VM scheduling and resource provisioning, require accurate performance prediction.<n>We propose CloudFormer, a dual-branch Transformer-based model designed to predict VM performance degradation in black-box environments.
Score: 9.00677252346245
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cloud platforms are increasingly relied upon to host diverse, resource-intensive workloads due to their scalability, flexibility, and cost-efficiency. In multi-tenant cloud environments, virtual machines are consolidated on shared physical servers to improve resource utilization. While virtualization guarantees resource partitioning for CPU, memory, and storage, it cannot ensure performance isolation. Competition for shared resources such as last-level cache, memory bandwidth, and network interfaces often leads to severe performance degradation. Existing management techniques, including VM scheduling and resource provisioning, require accurate performance prediction to mitigate interference. However, this remains challenging in public clouds due to the black-box nature of VMs and the highly dynamic nature of workloads. To address these limitations, we propose CloudFormer, a dual-branch Transformer-based model designed to predict VM performance degradation in black-box environments. CloudFormer jointly models temporal dynamics and system-level interactions, leveraging 206 system metrics at one-second resolution across both static and dynamic scenarios. This design enables the model to capture transient interference effects and adapt to varying workload conditions without scenario-specific tuning. Complementing the methodology, we provide a fine-grained dataset that significantly expands the temporal resolution and metric diversity compared to existing benchmarks. Experimental results demonstrate that CloudFormer consistently outperforms state-of-the-art baselines across multiple evaluation metrics, achieving robust generalization across diverse and previously unseen workloads. Notably, CloudFormer attains a mean absolute error (MAE) of just 7.8%, representing a substantial improvement in predictive accuracy and outperforming existing methods at least by 28%.

Related papers

Adaptive Dual-Weighting Framework for Federated Learning via Out-of-Distribution Detection [53.45696787935487]
Federated Learning (FL) enables collaborative model training across large-scale distributed service nodes.<n>In real-world service-oriented deployments, data generated by heterogeneous users, devices, and application scenarios are inherently non-IID.<n>We propose FLood, a novel FL framework inspired by out-of-distribution (OOD) detection.
arXiv Detail & Related papers (2026-02-01T05:54:59Z)
Big Data Workload Profiling for Energy-Aware Cloud Resource Management [0.0]
This paper presents a workload aware and energy efficient scheduling framework.<n>It profiles utilization, memory demand, and storage IO behavior to guide virtual machine placement decisions.<n>Results demonstrate consistent energy savings of 15 to 20 percent compared to a baseline scheduler.
arXiv Detail & Related papers (2026-01-17T06:50:51Z)
AI-Driven Cloud Resource Optimization for Multi-Cluster Environments [0.0]
This paper presents an AI-driven framework for adaptive resource optimization in multi-cluster cloud systems.<n>The proposed approach integrates predictive learning, policy-aware decision-making, and continuous feedback.<n>A prototype implementation demonstrates improved resource efficiency, faster stabilization during workload fluctuations, and reduced performance variability.
arXiv Detail & Related papers (2025-12-31T15:15:46Z)
AVERY: Adaptive VLM Split Computing through Embodied Self-Awareness for Efficient Disaster Response Systems [6.294240680169978]
Unmanned Aerial Vehicles (UAVs) in disaster response require complex, queryable intelligence that on-board CNNs cannot provide.<n>We present AVERY, a framework that enables VLM deployment through adaptive split computing.
arXiv Detail & Related papers (2025-11-22T18:42:04Z)
The Curious Case of In-Training Compression of State Space Models [49.819321766705514]
State Space Models (SSMs) tackle long sequence modeling tasks efficiently, offer both parallelizable training and fast inference.<n>Key design challenge is striking the right balance between maximizing expressivity and limiting this computational burden.<n>Our approach, textscCompreSSM, applies to Linear Time-Invariant SSMs such as Linear Recurrent Units, but is also extendable to selective models.
arXiv Detail & Related papers (2025-10-03T09:02:33Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Dual-Model Distillation for Efficient Action Classification with Hybrid Edge-Cloud Solution [1.8029479474051309]
We design a hybrid edge-cloud solution that leverages the efficiency of smaller models for local processing while deferring to larger, more accurate cloud-based models when necessary. Specifically, we propose a novel unsupervised data generation method, Dual-Model Distillation (DMD), to train a lightweight switcher model that can predict when the edge model's output is uncertain. Experimental results on the action classification task show that our framework not only requires less computational overhead, but also improves accuracy compared to using a large model alone.
arXiv Detail & Related papers (2024-10-16T02:06:27Z)
SeBS-Flow: Benchmarking Serverless Cloud Function Workflows [51.4200085836966]
We propose the first serverless workflow benchmarking suite SeBS-Flow.<n>SeBS-Flow includes six real-world application benchmarks and four microbenchmarks representing different computational patterns.<n>We conduct comprehensive evaluations on three major cloud platforms, assessing performance, cost, scalability, and runtime deviations.
arXiv Detail & Related papers (2024-10-04T14:52:18Z)
Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation [56.79064699832383]
We establish a Cloud-Edge Elastic Model Adaptation (CEMA) paradigm in which the edge models only need to perform forward propagation. In our CEMA, to reduce the communication burden, we devise two criteria to exclude unnecessary samples from uploading to the cloud.
arXiv Detail & Related papers (2024-02-27T08:47:19Z)
Cloud-Device Collaborative Learning for Multimodal Large Language Models [24.65882336700547]
We introduce a Cloud-Device Collaborative Continual Adaptation framework to enhance the performance of compressed, device-deployed MLLMs. Our framework is structured into three key components: a device-to-cloud uplink for efficient data transmission, cloud-based knowledge adaptation, and an optimized cloud-to-device downlink for model deployment.
arXiv Detail & Related papers (2023-12-26T18:46:14Z)
Alioth: A Machine Learning Based Interference-Aware Performance Monitor for Multi-Tenancy Applications in Public Cloud [15.942285615596566]
Multi-tenancy in public clouds may lead to co-location interference on shared resources, which possibly results in performance degradation. We propose a novel machine learning framework, Alioth, to monitor the performance degradation of cloud applications. Alioth achieves an average mean absolute error of 5.29% offline and 10.8% when testing on applications unseen in the training stage.
arXiv Detail & Related papers (2023-07-18T03:34:33Z)
Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models [64.49254199311137]
We propose a novel Instance-aware Dynamic Prompt Tuning (IDPT) strategy for pre-trained point cloud models. The essence of IDPT is to develop a dynamic prompt generation module to perceive semantic prior features of each point cloud instance. In experiments, IDPT outperforms full fine-tuning in most tasks with a mere 7% of the trainable parameters.
arXiv Detail & Related papers (2023-04-14T16:03:09Z)
Cloud-Device Collaborative Adaptation to Continual Changing Environments in the Real-world [20.547119604004774]
We propose a new learning paradigm of Cloud-Device Collaborative Continual Adaptation, which encourages collaboration between cloud and device. We also propose an Uncertainty-based Visual Prompt Adapted (U-VPA) teacher-student model to transfer the generalization capability of the large model on the cloud to the device model. Our proposed U-VPA teacher-student framework outperforms previous state-of-the-art test time adaptation and device-cloud collaboration methods.
arXiv Detail & Related papers (2022-12-02T05:02:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.