Two-Timescale Model Caching and Resource Allocation for Edge-Enabled AI-Generated Content Services
- URL: http://arxiv.org/abs/2411.01458v1
- Date: Sun, 03 Nov 2024 07:01:13 GMT
- Title: Two-Timescale Model Caching and Resource Allocation for Edge-Enabled AI-Generated Content Services
- Authors: Zhang Liu, Hongyang Du, Xiangwang Hou, Lianfen Huang, Seyyedali Hosseinalipour, Dusit Niyato, Khaled Ben Letaief,
- Abstract summary: Generative AI (GenAI) has emerged as a transformative technology, enabling customized and personalized AI-generated content (AIGC) services.
These services require executing GenAI models with billions of parameters, posing significant obstacles to resource-limited wireless edge.
We introduce the formulation of joint model caching and resource allocation for AIGC services to balance a trade-off between AIGC quality and latency metrics.
- Score: 55.0337199834612
- License:
- Abstract: Generative AI (GenAI) has emerged as a transformative technology, enabling customized and personalized AI-generated content (AIGC) services. In this paper, we address challenges of edge-enabled AIGC service provisioning, which remain underexplored in the literature. These services require executing GenAI models with billions of parameters, posing significant obstacles to resource-limited wireless edge. We subsequently introduce the formulation of joint model caching and resource allocation for AIGC services to balance a trade-off between AIGC quality and latency metrics. We obtain mathematical relationships of these metrics with the computational resources required by GenAI models via experimentation. Afterward, we decompose the formulation into a model caching subproblem on a long-timescale and a resource allocation subproblem on a short-timescale. Since the variables to be solved are discrete and continuous, respectively, we leverage a double deep Q-network (DDQN) algorithm to solve the former subproblem and propose a diffusion-based deep deterministic policy gradient (D3PG) algorithm to solve the latter. The proposed D3PG algorithm makes an innovative use of diffusion models as the actor network to determine optimal resource allocation decisions. Consequently, we integrate these two learning methods within the overarching two-timescale deep reinforcement learning (T2DRL) algorithm, the performance of which is studied through comparative numerical simulations.
Related papers
- Intelligent Mobile AI-Generated Content Services via Interactive Prompt Engineering and Dynamic Service Provisioning [55.641299901038316]
AI-generated content can organize collaborative Mobile AIGC Service Providers (MASPs) at network edges to provide ubiquitous and customized content for resource-constrained users.
Such a paradigm faces two significant challenges: 1) raw prompts often lead to poor generation quality due to users' lack of experience with specific AIGC models, and 2) static service provisioning fails to efficiently utilize computational and communication resources.
We develop an interactive prompt engineering mechanism that leverages a Large Language Model (LLM) to generate customized prompt corpora and employs Inverse Reinforcement Learning (IRL) for policy imitation.
arXiv Detail & Related papers (2025-02-17T03:05:20Z) - Accelerating AIGC Services with Latent Action Diffusion Scheduling in Edge Networks [27.961536719427205]
Current AIGC models primarily focus on content quality within a centralized framework, resulting in a high service delay and negative user experiences.
We propose LAD-TS, a novel Latent Action Diffusion-based Task Scheduling method that orchestrates multiple edge servers for expedited AIGC services.
We also develop DEdgeAI, a prototype edge system with a refined AIGC model deployment to implement and evaluate our LAD-TS method.
arXiv Detail & Related papers (2024-12-24T06:40:13Z) - Learning for Cross-Layer Resource Allocation in MEC-Aided Cell-Free Networks [71.30914500714262]
Cross-layer resource allocation over mobile edge computing (MEC)-aided cell-free networks can sufficiently exploit the transmitting and computing resources to promote the data rate.
Joint subcarrier allocation and beamforming optimization are investigated for the MEC-aided cell-free network from the perspective of deep learning.
arXiv Detail & Related papers (2024-12-21T10:18:55Z) - Resource-Efficient Generative AI Model Deployment in Mobile Edge Networks [15.958822667638405]
The scarcity of available resources on the edge pose significant challenges in deploying generative AI models.
We present a collaborative edge-cloud framework aiming to properly manage generative AI model deployment on the edge.
arXiv Detail & Related papers (2024-09-09T03:17:28Z) - DNN Partitioning, Task Offloading, and Resource Allocation in Dynamic Vehicular Networks: A Lyapunov-Guided Diffusion-Based Reinforcement Learning Approach [49.56404236394601]
We formulate the problem of joint DNN partitioning, task offloading, and resource allocation in Vehicular Edge Computing.
Our objective is to minimize the DNN-based task completion time while guaranteeing the system stability over time.
We propose a Multi-Agent Diffusion-based Deep Reinforcement Learning (MAD2RL) algorithm, incorporating the innovative use of diffusion models.
arXiv Detail & Related papers (2024-06-11T06:31:03Z) - Offloading and Quality Control for AI Generated Content Services in 6G Mobile Edge Computing Networks [18.723955271182007]
This paper proposes a joint optimization algorithm for offloading decisions, computation time, and diffusion steps of the diffusion models in the reverse diffusion stage.
Experimental results conclusively demonstrate that the proposed algorithm achieves superior joint optimization performance compared to the baselines.
arXiv Detail & Related papers (2023-12-11T08:36:27Z) - Enabling AI-Generated Content (AIGC) Services in Wireless Edge Networks [68.00382171900975]
In wireless edge networks, the transmission of incorrectly generated content may unnecessarily consume network resources.
We present the AIGC-as-a-service concept and discuss the challenges in deploying A at the edge networks.
We propose a deep reinforcement learning-enabled algorithm for optimal ASP selection.
arXiv Detail & Related papers (2023-01-09T09:30:23Z) - Wireless Resource Management in Intelligent Semantic Communication
Networks [15.613654766345702]
We address the user association (UA) and bandwidth allocation problems in an ISC-enabled heterogeneous network (ISC-HetNet)
We propose a two-stage solution, including a programming method to obtain an objective, and a algorithm in the second stage to reach the optimality of UA and BA.
arXiv Detail & Related papers (2022-02-15T18:28:28Z) - Resource Allocation via Model-Free Deep Learning in Free Space Optical
Communications [119.81868223344173]
The paper investigates the general problem of resource allocation for mitigating channel fading effects in Free Space Optical (FSO) communications.
Under this framework, we propose two algorithms that solve FSO resource allocation problems.
arXiv Detail & Related papers (2020-07-27T17:38:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.