Related papers: Optimal Resource Allocation for Serverless Queries

Optimal Resource Allocation for Serverless Queries

URL: http://arxiv.org/abs/2107.08594v1
Date: Mon, 19 Jul 2021 02:55:48 GMT
Title: Optimal Resource Allocation for Serverless Queries
Authors: Anish Pimpley, Shuo Li, Anubha Srivastava, Vishal Rohra, Yi Zhu, Soundararajan Srinivasan, Alekh Jindal, Hiren Patel, Shi Qiao, Rathijit Sen
Abstract summary: Prior work focused on predicting peak allocation while ignoring aggressive trade-offs between resource allocation and run-time. We introduce a system for optimal resource allocation that can predict performance with aggressive trade-offs, for both new and past observed queries.
Score: 8.59568779761598
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Optimizing resource allocation for analytical workloads is vital for reducing costs of cloud-data services. At the same time, it is incredibly hard for users to allocate resources per query in serverless processing systems, and they frequently misallocate by orders of magnitude. Unfortunately, prior work focused on predicting peak allocation while ignoring aggressive trade-offs between resource allocation and run-time. Additionally, these methods fail to predict allocation for queries that have not been observed in the past. In this paper, we tackle both these problems. We introduce a system for optimal resource allocation that can predict performance with aggressive trade-offs, for both new and past observed queries. We introduce the notion of a performance characteristic curve (PCC) as a parameterized representation that can compactly capture the relationship between resources and performance. To tackle training data sparsity, we introduce a novel data augmentation technique to efficiently synthesize the entire PCC using a single run of the query. Lastly, we demonstrate the advantages of a constrained loss function coupled with GNNs, over traditional ML methods, for capturing the domain specific behavior through an extensive experimental evaluation over SCOPE big data workloads at Microsoft.

Related papers

The Larger the Merrier? Efficient Large AI Model Inference in Wireless Edge Networks [56.37880529653111]
The demand for large computation model (LAIM) services is driving a paradigm shift from traditional cloud-based inference to edge-based inference for low-latency, privacy-preserving applications.<n>In this paper, we investigate the LAIM-inference scheme, where a pre-trained LAIM is pruned and partitioned into on-device and on-server sub-models for deployment.
arXiv Detail & Related papers (2025-05-14T08:18:55Z)
Network Resource Optimization for ML-Based UAV Condition Monitoring with Vibration Analysis [54.550658461477106]
Condition Monitoring (CM) uses Machine Learning (ML) models to identify abnormal and adverse conditions. This work explores the optimization of network resources for ML-based UAV CM frameworks. By leveraging dimensionality reduction techniques, there is a 99.9% reduction in network resource consumption.
arXiv Detail & Related papers (2025-02-21T14:36:12Z)
Topology-aware Preemptive Scheduling for Co-located LLM Workloads [7.240168647854797]
We develop a fine-grained topology-aware method for scheduling of hybrid workloads. This method significantly increases the efficiency of preemption and improves overall scheduled performance for LLM workloads by $55%$.
arXiv Detail & Related papers (2024-11-18T13:26:09Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems [67.52782366565658]
State-of-the-art recommender systems (RSs) depend on categorical features, which ecoded by embedding vectors, resulting in excessively large embedding tables. Despite the prosperity of lightweight embedding-based RSs, a wide diversity is seen in evaluation protocols. This study investigates various LERS' performance, efficiency, and cross-task transferability via a thorough benchmarking process.
arXiv Detail & Related papers (2024-06-25T07:45:00Z)
Switchable Decision: Dynamic Neural Generation Networks [98.61113699324429]
We propose a switchable decision to accelerate inference by dynamically assigning resources for each data instance. Our method benefits from less cost during inference while keeping the same accuracy.
arXiv Detail & Related papers (2024-05-07T17:44:54Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
Analysis and Optimization of Wireless Federated Learning with Data Heterogeneity [72.85248553787538]
This paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation. We formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE) Experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
arXiv Detail & Related papers (2023-08-04T04:18:01Z)
Outage Performance and Novel Loss Function for an ML-Assisted Resource Allocation: An Exact Analytical Framework [2.1397655110395752]
We introduce a novel loss function to minimize the outage probability of an ML-based resource allocation system. An ML binary classification predictor assists in selecting a resource satisfying the established outage criterium.
arXiv Detail & Related papers (2023-05-16T18:23:52Z)
Compressed Regression over Adaptive Networks [58.79251288443156]
We derive the performance achievable by a network of distributed agents that solve, adaptively and in the presence of communication constraints, a regression problem. We devise an optimized allocation strategy where the parameters necessary for the optimization can be learned online by the agents.
arXiv Detail & Related papers (2023-04-07T13:41:08Z)
Perona: Robust Infrastructure Fingerprinting for Resource-Efficient Big Data Analytics [0.06524460254566904]
We present Perona, a novel approach to robust infrastructure fingerprinting for usage in big data analytics. Perona employs common sets and configurations of benchmarking tools for target resources, so that resulting benchmark metrics are directly comparable and ranking is enabled. We evaluate our approach both on data gathered from our own experiments as well as within related works for resource configuration optimization.
arXiv Detail & Related papers (2022-11-15T15:48:09Z)
On the Potential of Execution Traces for Batch Processing Workload Optimization in Public Clouds [0.0]
We propose a collaborative approach for sharing anonymized workload execution traces among users. We mining them for general patterns, and exploiting clusters of historical workloads for future optimizations.
arXiv Detail & Related papers (2021-11-16T20:11:36Z)
A Predictive Autoscaler for Elastic Batch Jobs [8.354712625979776]
Large batch jobs such as Deep Learning, HPC and Spark require far more computational resources and higher cost than conventional online service. We propose a predictive autoscaler to provide an elastic interface for the customers and overprovision instances.
arXiv Detail & Related papers (2020-10-10T17:35:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.