Related papers: Resource-Efficient Generative AI Model Deployment in Mobile Edge Networks

Resource-Efficient Generative AI Model Deployment in Mobile Edge Networks

URL: http://arxiv.org/abs/2409.05303v1
Date: Mon, 9 Sep 2024 03:17:28 GMT
Title: Resource-Efficient Generative AI Model Deployment in Mobile Edge Networks
Authors: Yuxin Liang, Peng Yang, Yuanyuan He, Feng Lyu,
Abstract summary: The scarcity of available resources on the edge pose significant challenges in deploying generative AI models. We present a collaborative edge-cloud framework aiming to properly manage generative AI model deployment on the edge.
Score: 15.958822667638405
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The surging development of Artificial Intelligence-Generated Content (AIGC) marks a transformative era of the content creation and production. Edge servers promise attractive benefits, e.g., reduced service delay and backhaul traffic load, for hosting AIGC services compared to cloud-based solutions. However, the scarcity of available resources on the edge pose significant challenges in deploying generative AI models. In this paper, by characterizing the resource and delay demands of typical generative AI models, we find that the consumption of storage and GPU memory, as well as the model switching delay represented by I/O delay during the preloading phase, are significant and vary across models. These multidimensional coupling factors render it difficult to make efficient edge model deployment decisions. Hence, we present a collaborative edge-cloud framework aiming to properly manage generative AI model deployment on the edge. Specifically, we formulate edge model deployment problem considering heterogeneous features of models as an optimization problem, and propose a model-level decision selection algorithm to solve it. It enables pooled resource sharing and optimizes the trade-off between resource consumption and delay in edge generative AI model deployment. Simulation results validate the efficacy of the proposed algorithm compared with baselines, demonstrating its potential to reduce overall costs by providing feature-aware model deployment decisions.

Related papers

Privacy-Aware Joint DNN Model Deployment and Partition Optimization for Delay-Efficient Collaborative Edge Inference [14.408050197587654]
Edge inference (EI) is a key solution to address the growing challenges of delayed response times, limited scalability, and privacy concerns in cloud-based Deep Neural Network (DNN) inference. This paper proposes a novel framework for privacy-aware joint DNN model deployment and partition optimization to minimize long-term average inference delay under resource and privacy constraints.
arXiv Detail & Related papers (2025-02-22T05:27:24Z)
Network Resource Optimization for ML-Based UAV Condition Monitoring with Vibration Analysis [54.550658461477106]
Condition Monitoring (CM) uses Machine Learning (ML) models to identify abnormal and adverse conditions. This work explores the optimization of network resources for ML-based UAV CM frameworks. By leveraging dimensionality reduction techniques, there is a 99.9% reduction in network resource consumption.
arXiv Detail & Related papers (2025-02-21T14:36:12Z)
Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies [14.115655986504411]
5G and edge computing hardware has brought about a significant shift in artificial intelligence. deploying state-of-the-art AI models on resource-constrained edge devices faces significant challenges. This paper presents an optimization triad for efficient and reliable edge AI deployment.
arXiv Detail & Related papers (2025-01-04T06:17:48Z)
Accelerating AIGC Services with Latent Action Diffusion Scheduling in Edge Networks [27.961536719427205]
Current AIGC models primarily focus on content quality within a centralized framework, resulting in a high service delay and negative user experiences. We propose LAD-TS, a novel Latent Action Diffusion-based Task Scheduling method that orchestrates multiple edge servers for expedited AIGC services. We also develop DEdgeAI, a prototype edge system with a refined AIGC model deployment to implement and evaluate our LAD-TS method.
arXiv Detail & Related papers (2024-12-24T06:40:13Z)
Efficient Fine-Tuning and Concept Suppression for Pruned Diffusion Models [93.76814568163353]
We propose a novel bilevel optimization framework for pruned diffusion models. This framework consolidates the fine-tuning and unlearning processes into a unified phase. It is compatible with various pruning and concept unlearning methods.
arXiv Detail & Related papers (2024-12-19T19:13:18Z)
Two-Timescale Model Caching and Resource Allocation for Edge-Enabled AI-Generated Content Services [55.0337199834612]
Generative AI (GenAI) has emerged as a transformative technology, enabling customized and personalized AI-generated content (AIGC) services. These services require executing GenAI models with billions of parameters, posing significant obstacles to resource-limited wireless edge. We introduce the formulation of joint model caching and resource allocation for AIGC services to balance a trade-off between AIGC quality and latency metrics.
arXiv Detail & Related papers (2024-11-03T07:01:13Z)
Profiling AI Models: Towards Efficient Computation Offloading in Heterogeneous Edge AI Systems [0.2357055571094446]
We propose a research roadmap focused on profiling AI models, capturing data about model types and underlying hardware to predict resource utilisation and task completion time. Experiments with over 3,000 runs show promise in optimising resource allocation and enhancing Edge AI performance.
arXiv Detail & Related papers (2024-10-30T16:07:14Z)
Dual-Model Distillation for Efficient Action Classification with Hybrid Edge-Cloud Solution [1.8029479474051309]
We design a hybrid edge-cloud solution that leverages the efficiency of smaller models for local processing while deferring to larger, more accurate cloud-based models when necessary. Specifically, we propose a novel unsupervised data generation method, Dual-Model Distillation (DMD), to train a lightweight switcher model that can predict when the edge model's output is uncertain. Experimental results on the action classification task show that our framework not only requires less computational overhead, but also improves accuracy compared to using a large model alone.
arXiv Detail & Related papers (2024-10-16T02:06:27Z)
DiffSG: A Generative Solver for Network Optimization with Diffusion Model [75.27274046562806]
Diffusion generative models can consider a broader range of solutions and exhibit stronger generalization by learning parameters. We propose a new framework, which leverages intrinsic distribution learning of diffusion generative models to learn high-quality solutions.
arXiv Detail & Related papers (2024-08-13T07:56:21Z)
Multi-Agent RL-Based Industrial AIGC Service Offloading over Wireless Edge Networks [19.518346220904732]
We propose a generative model-driven industrial AIGC collaborative edge learning framework. This framework aims to facilitate efficient few-shot learning by leveraging realistic sample synthesis and edge-based optimization capabilities.
arXiv Detail & Related papers (2024-05-05T15:31:47Z)
Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation [56.79064699832383]
We establish a Cloud-Edge Elastic Model Adaptation (CEMA) paradigm in which the edge models only need to perform forward propagation. In our CEMA, to reduce the communication burden, we devise two criteria to exclude unnecessary samples from uploading to the cloud.
arXiv Detail & Related papers (2024-02-27T08:47:19Z)
Offloading and Quality Control for AI Generated Content Services in 6G Mobile Edge Computing Networks [18.723955271182007]
This paper proposes a joint optimization algorithm for offloading decisions, computation time, and diffusion steps of the diffusion models in the reverse diffusion stage. Experimental results conclusively demonstrate that the proposed algorithm achieves superior joint optimization performance compared to the baselines.
arXiv Detail & Related papers (2023-12-11T08:36:27Z)
Enabling AI-Generated Content (AIGC) Services in Wireless Edge Networks [68.00382171900975]
In wireless edge networks, the transmission of incorrectly generated content may unnecessarily consume network resources. We present the AIGC-as-a-service concept and discuss the challenges in deploying A at the edge networks. We propose a deep reinforcement learning-enabled algorithm for optimal ASP selection.
arXiv Detail & Related papers (2023-01-09T09:30:23Z)
Reconfigurable Intelligent Surface Assisted Mobile Edge Computing with Heterogeneous Learning Tasks [53.1636151439562]
Mobile edge computing (MEC) provides a natural platform for AI applications. We present an infrastructure to perform machine learning tasks at an MEC with the assistance of a reconfigurable intelligent surface (RIS) Specifically, we minimize the learning error of all participating users by jointly optimizing transmit power of mobile users, beamforming vectors of the base station, and the phase-shift matrix of the RIS.
arXiv Detail & Related papers (2020-12-25T07:08:50Z)
Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network. We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.