Resource-Efficient Generative AI Model Deployment in Mobile Edge Networks
- URL: http://arxiv.org/abs/2409.05303v1
- Date: Mon, 9 Sep 2024 03:17:28 GMT
- Title: Resource-Efficient Generative AI Model Deployment in Mobile Edge Networks
- Authors: Yuxin Liang, Peng Yang, Yuanyuan He, Feng Lyu,
- Abstract summary: The scarcity of available resources on the edge pose significant challenges in deploying generative AI models.
We present a collaborative edge-cloud framework aiming to properly manage generative AI model deployment on the edge.
- Score: 15.958822667638405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The surging development of Artificial Intelligence-Generated Content (AIGC) marks a transformative era of the content creation and production. Edge servers promise attractive benefits, e.g., reduced service delay and backhaul traffic load, for hosting AIGC services compared to cloud-based solutions. However, the scarcity of available resources on the edge pose significant challenges in deploying generative AI models. In this paper, by characterizing the resource and delay demands of typical generative AI models, we find that the consumption of storage and GPU memory, as well as the model switching delay represented by I/O delay during the preloading phase, are significant and vary across models. These multidimensional coupling factors render it difficult to make efficient edge model deployment decisions. Hence, we present a collaborative edge-cloud framework aiming to properly manage generative AI model deployment on the edge. Specifically, we formulate edge model deployment problem considering heterogeneous features of models as an optimization problem, and propose a model-level decision selection algorithm to solve it. It enables pooled resource sharing and optimizes the trade-off between resource consumption and delay in edge generative AI model deployment. Simulation results validate the efficacy of the proposed algorithm compared with baselines, demonstrating its potential to reduce overall costs by providing feature-aware model deployment decisions.
Related papers
- Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies [14.115655986504411]
5G and edge computing hardware has brought about a significant shift in artificial intelligence.
deploying state-of-the-art AI models on resource-constrained edge devices faces significant challenges.
This paper presents an optimization triad for efficient and reliable edge AI deployment.
arXiv Detail & Related papers (2025-01-04T06:17:48Z) - Accelerating AIGC Services with Latent Action Diffusion Scheduling in Edge Networks [27.961536719427205]
Current AIGC models primarily focus on content quality within a centralized framework, resulting in a high service delay and negative user experiences.
We propose LAD-TS, a novel Latent Action Diffusion-based Task Scheduling method that orchestrates multiple edge servers for expedited AIGC services.
We also develop DEdgeAI, a prototype edge system with a refined AIGC model deployment to implement and evaluate our LAD-TS method.
arXiv Detail & Related papers (2024-12-24T06:40:13Z) - Efficient Fine-Tuning and Concept Suppression for Pruned Diffusion Models [93.76814568163353]
We propose a novel bilevel optimization framework for pruned diffusion models.
This framework consolidates the fine-tuning and unlearning processes into a unified phase.
It is compatible with various pruning and concept unlearning methods.
arXiv Detail & Related papers (2024-12-19T19:13:18Z) - Two-Timescale Model Caching and Resource Allocation for Edge-Enabled AI-Generated Content Services [55.0337199834612]
Generative AI (GenAI) has emerged as a transformative technology, enabling customized and personalized AI-generated content (AIGC) services.
These services require executing GenAI models with billions of parameters, posing significant obstacles to resource-limited wireless edge.
We introduce the formulation of joint model caching and resource allocation for AIGC services to balance a trade-off between AIGC quality and latency metrics.
arXiv Detail & Related papers (2024-11-03T07:01:13Z) - DiffSG: A Generative Solver for Network Optimization with Diffusion Model [75.27274046562806]
Diffusion generative models can consider a broader range of solutions and exhibit stronger generalization by learning parameters.
We propose a new framework, which leverages intrinsic distribution learning of diffusion generative models to learn high-quality solutions.
arXiv Detail & Related papers (2024-08-13T07:56:21Z) - Multi-Agent RL-Based Industrial AIGC Service Offloading over Wireless Edge Networks [19.518346220904732]
We propose a generative model-driven industrial AIGC collaborative edge learning framework.
This framework aims to facilitate efficient few-shot learning by leveraging realistic sample synthesis and edge-based optimization capabilities.
arXiv Detail & Related papers (2024-05-05T15:31:47Z) - Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation [56.79064699832383]
We establish a Cloud-Edge Elastic Model Adaptation (CEMA) paradigm in which the edge models only need to perform forward propagation.
In our CEMA, to reduce the communication burden, we devise two criteria to exclude unnecessary samples from uploading to the cloud.
arXiv Detail & Related papers (2024-02-27T08:47:19Z) - Offloading and Quality Control for AI Generated Content Services in 6G Mobile Edge Computing Networks [18.723955271182007]
This paper proposes a joint optimization algorithm for offloading decisions, computation time, and diffusion steps of the diffusion models in the reverse diffusion stage.
Experimental results conclusively demonstrate that the proposed algorithm achieves superior joint optimization performance compared to the baselines.
arXiv Detail & Related papers (2023-12-11T08:36:27Z) - Enabling AI-Generated Content (AIGC) Services in Wireless Edge Networks [68.00382171900975]
In wireless edge networks, the transmission of incorrectly generated content may unnecessarily consume network resources.
We present the AIGC-as-a-service concept and discuss the challenges in deploying A at the edge networks.
We propose a deep reinforcement learning-enabled algorithm for optimal ASP selection.
arXiv Detail & Related papers (2023-01-09T09:30:23Z) - Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned
Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network.
We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.