Related papers: Input-Based Ensemble-Learning Method for Dynamic Memory Configuration of Serverless Computing Functions

Input-Based Ensemble-Learning Method for Dynamic Memory Configuration of Serverless Computing Functions

URL: http://arxiv.org/abs/2411.07444v1
Date: Tue, 12 Nov 2024 00:03:11 GMT
Title: Input-Based Ensemble-Learning Method for Dynamic Memory Configuration of Serverless Computing Functions
Authors: Siddharth Agarwal, Maria A. Rodriguez, Rajkumar Buyya,
Abstract summary: We present MemFigLess, a serverless solution that estimates the memory requirement of a serverless function with input-awareness. MemFigLess is able to capture the input-aware resource relationships and allocate upto 82% less resources and save up to 87% run-time costs.
Score: 18.36339203254509
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In today's Function-as-a-Service offerings, a programmer is usually responsible for configuring function memory for its successful execution, which allocates proportional function resources such as CPU and network. However, right-sizing the function memory force developers to speculate performance and make ad-hoc configuration decisions. Recent research has highlighted that a function's input characteristics, such as input size, type and number of inputs, significantly impact its resource demand, run-time performance and costs with fluctuating workloads. This correlation further makes memory configuration a non-trivial task. On that account, an input-aware function memory allocator not only improves developer productivity by completely hiding resource-related decisions but also drives an opportunity to reduce resource wastage and offer a finer-grained cost-optimised pricing scheme. Therefore, we present MemFigLess, a serverless solution that estimates the memory requirement of a serverless function with input-awareness. The framework executes function profiling in an offline stage and trains a multi-output Random Forest Regression model on the collected metrics to invoke input-aware optimal configurations. We evaluate our work with the state-of-the-art approaches on AWS Lambda service to find that MemFigLess is able to capture the input-aware resource relationships and allocate upto 82% less resources and save up to 87% run-time costs.

Related papers

Cost-Optimal Grouped-Query Attention for Long-Context LLMs [64.90662568387683]
Building effective Transformer-based large language models (LLMs) has recently become a research focus. We compare models with different parameter sizes, context lengths, and attention head configurations in terms of model performance, computational cost, and memory cost. Our studies show that, when processing sufficiently long sequences, a larger model with fewer attention heads can achieve a lower loss while incurring lower computational and memory costs.
arXiv Detail & Related papers (2025-03-12T17:50:42Z)
CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation [63.65323577445951]
We propose a novel approach called Cache Sparse Representation (CSR) CSR transforms the dense Key-Value cache tensor into sparse indexes and weights, offering a more memory-efficient representation during LLM inference. Our experiments demonstrate CSR achieves performance comparable to state-of-the-art KV cache quantization algorithms.
arXiv Detail & Related papers (2024-12-16T13:01:53Z)
Switchable Decision: Dynamic Neural Generation Networks [98.61113699324429]
We propose a switchable decision to accelerate inference by dynamically assigning resources for each data instance. Our method benefits from less cost during inference while keeping the same accuracy.
arXiv Detail & Related papers (2024-05-07T17:44:54Z)
SPES: Towards Optimizing Performance-Resource Trade-Off for Serverless Functions [31.01399126339857]
Serverless computing is gaining traction due to its efficiency and ability to harness on-demand cloud resources. Existing solutions tend to use over-simplistic strategies for function pre-loading/unloading without full invocation pattern exploitation. We propose SPES, the first differentiated scheduler for runtime cold start mitigation by optimizing serverless function provision.
arXiv Detail & Related papers (2024-03-26T10:28:41Z)
Shabari: Delayed Decision-Making for Faster and Efficient Serverless Functions [0.30693357740321775]
We introduce Shabari, a resource management framework for serverless systems. Shabari makes decisions as late as possible to right-size each invocation to meet functions' performance objectives. For a range of serverless functions and inputs, Shabari reduces SLO violations by 11-73%.
arXiv Detail & Related papers (2024-01-16T22:20:36Z)
Democratizing LLMs: An Exploration of Cost-Performance Trade-offs in Self-Refined Open-Source Models [53.859446823312126]
SoTA open source models of varying sizes from 7B - 65B, on average, improve 8.2% from their baseline performance. Strikingly, even models with extremely small memory footprints, such as Vicuna-7B, show a 11.74% improvement overall and up to a 25.39% improvement in high-creativity, open ended tasks.
arXiv Detail & Related papers (2023-10-11T15:56:00Z)
Joint Modeling of Feature, Correspondence, and a Compressed Memory for Video Object Segmentation [52.11279360934703]
Current prevailing Video Object (VOS) methods usually perform dense matching between the current and reference frames after extracting features. We propose a unified VOS framework, coined as JointFormer, for joint modeling of the three elements of feature, correspondence, and a compressed memory.
arXiv Detail & Related papers (2023-08-25T17:30:08Z)
On-demand Cold Start Frequency Reduction with Off-Policy Reinforcement Learning in Serverless Computing [18.36339203254509]
The presented work focuses on reducing the frequent, on-demand cold starts on the platform by using Reinforcement Learning(RL) The proposed approach uses model-free Q-learning that consider function metrics such as CPU utilization, existing function instances, and response failure rate, to proactively initialize functions, in advance. The evaluation results demonstrate a favourable performance of the RL-based agent when compared to Kubeless' default policy and a function keep-alive policy.
arXiv Detail & Related papers (2023-08-15T03:01:41Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
Mesa: A Memory-saving Training Framework for Transformers [58.78933015299703]
We present Mesa, a memory-saving training framework for Transformers. Mesa uses exact activations during forward pass while storing a low-precision version of activations to reduce memory consumption during training. Experiments on ImageNet, CIFAR-100 and ADE20K demonstrate that Mesa can reduce half of the memory footprints during training.
arXiv Detail & Related papers (2021-11-22T11:23:01Z)
Harvesting Idle Resources in Serverless Computing via Reinforcement Learning [7.346628578439277]
FRM maximizes resource efficiency by dynamically harvesting idle resources from functions over-supplied to functions under-supplied. FRM monitors each function's resource utilization in real-time, detects over-provisioning and under-provisioning, and applies deep reinforcement learning to harvest idle resources safely. We have implemented and deployed a FRM prototype in a 13-node Apache OpenWhisk cluster.
arXiv Detail & Related papers (2021-08-28T23:02:56Z)
Optimal Resource Allocation for Serverless Queries [8.59568779761598]
Prior work focused on predicting peak allocation while ignoring aggressive trade-offs between resource allocation and run-time. We introduce a system for optimal resource allocation that can predict performance with aggressive trade-offs, for both new and past observed queries.
arXiv Detail & Related papers (2021-07-19T02:55:48Z)
Exploiting Submodular Value Functions For Scaling Up Active Perception [60.81276437097671]
In active perception tasks, agent aims to select sensory actions that reduce uncertainty about one or more hidden variables. Partially observable Markov decision processes (POMDPs) provide a natural model for such problems. As the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially.
arXiv Detail & Related papers (2020-09-21T09:11:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.