Deploying Foundation Model Powered Agent Services: A Survey
- URL: http://arxiv.org/abs/2412.13437v1
- Date: Wed, 18 Dec 2024 02:15:31 GMT
- Title: Deploying Foundation Model Powered Agent Services: A Survey
- Authors: Wenchao Xu, Jinyu Chen, Peirong Zheng, Xiaoquan Yi, Tianyi Tian, Wenhui Zhu, Quan Wan, Haozhao Wang, Yunfeng Fan, Qinliang Su, Xuemin Shen,
- Abstract summary: Foundation model (FM) powered agent services are regarded as a promising solution to develop intelligent and personalized applications.
This paper proposes a unified framework aimed at providing a comprehensive survey on deploying FM-based agent services across heterogeneous devices.
- Score: 33.27330704880908
- License:
- Abstract: Foundation model (FM) powered agent services are regarded as a promising solution to develop intelligent and personalized applications for advancing toward Artificial General Intelligence (AGI). To achieve high reliability and scalability in deploying these agent services, it is essential to collaboratively optimize computational and communication resources, thereby ensuring effective resource allocation and seamless service delivery. In pursuit of this vision, this paper proposes a unified framework aimed at providing a comprehensive survey on deploying FM-based agent services across heterogeneous devices, with the emphasis on the integration of model and resource optimization to establish a robust infrastructure for these services. Particularly, this paper begins with exploring various low-level optimization strategies during inference and studies approaches that enhance system scalability, such as parallelism techniques and resource scaling methods. The paper then discusses several prominent FMs and investigates research efforts focused on inference acceleration, including techniques such as model compression and token reduction. Moreover, the paper also investigates critical components for constructing agent services and highlights notable intelligent applications. Finally, the paper presents potential research directions for developing real-time agent services with high Quality of Service (QoS).
Related papers
- MediaMind: Revolutionizing Media Monitoring using Agentification [4.997673761305336]
This paper introduces MediaMind as a case study to demonstrate the agentification process.
The focus of this paper is on the technical methodologies and principles behind agentifying MediaMind.
arXiv Detail & Related papers (2025-02-18T11:05:38Z) - A Survey on Inference Optimization Techniques for Mixture of Experts Models [50.40325411764262]
Large-scale Mixture of Experts (MoE) models offer enhanced model capacity and computational efficiency through conditional computation.
deploying and running inference on these models presents significant challenges in computational resources, latency, and energy efficiency.
This survey analyzes optimization techniques for MoE models across the entire system stack.
arXiv Detail & Related papers (2024-12-18T14:11:15Z) - Agentic AI-Driven Technical Troubleshooting for Enterprise Systems: A Novel Weighted Retrieval-Augmented Generation Paradigm [0.0]
This paper presents a novel agentic AI solution built on a Weighted Retrieval-Augmented Generation (RAG) Framework tailored for enterprise technical troubleshooting.
By dynamically weighting retrieval sources such as product manuals, internal knowledge bases, FAQ, and troubleshooting guides, the framework prioritizes the most relevant data.
Preliminary evaluations on large enterprise datasets demonstrate the framework's efficacy in improving troubleshooting accuracy, reducing resolution times, and adapting to varied technical challenges.
arXiv Detail & Related papers (2024-12-16T17:32:38Z) - Joint Resource Optimization, Computation Offloading and Resource Slicing for Multi-Edge Traffic-Cognitive Networks [0.0]
This paper investigates a multi-agent system where both the platform and ESs are self-interested entities.
We propose a novel Stackelberg game-based framework to model interactions between stakeholders and solve the optimization problem.
We further design a decentralized solution leveraging neural network optimization and a privacy-preserving information exchange protocol.
arXiv Detail & Related papers (2024-11-26T11:51:10Z) - From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Large Language Model as a Catalyst: A Paradigm Shift in Base Station Siting Optimization [62.16747639440893]
Large language models (LLMs) and their associated technologies advance, particularly in the realms of prompt engineering and agent engineering.
Our proposed framework incorporates retrieval-augmented generation (RAG) to enhance the system's ability to acquire domain-specific knowledge and generate solutions.
arXiv Detail & Related papers (2024-08-07T08:43:32Z) - Dynamic Resource Allocation for Metaverse Applications with Deep
Reinforcement Learning [64.75603723249837]
This work proposes a novel framework to dynamically manage and allocate different types of resources for Metaverse applications.
We first propose an effective solution to divide applications into groups, namely MetaInstances, where common functions can be shared among applications.
Then, to capture the real-time, dynamic, and uncertain characteristics of request arrival and application departure processes, we develop a semi-Markov decision process-based framework.
arXiv Detail & Related papers (2023-02-27T00:30:01Z) - Evaluation of a blockchain-enabled resource management mechanism for
NGNs [0.0]
This paper examines the use of blockchain technology for resource management and negotiation among Network Providers (NPs)
The implementation of the resource management mechanism is described in a Smart Contract (SC) and the testbeds use the Raft and the IBFT consensus mechanisms respectively.
arXiv Detail & Related papers (2022-11-01T13:40:26Z) - Dif-MAML: Decentralized Multi-Agent Meta-Learning [54.39661018886268]
We propose a cooperative multi-agent meta-learning algorithm, referred to as MAML or Dif-MAML.
We show that the proposed strategy allows a collection of agents to attain agreement at a linear rate and to converge to a stationary point of the aggregate MAML.
Simulation results illustrate the theoretical findings and the superior performance relative to the traditional non-cooperative setting.
arXiv Detail & Related papers (2020-10-06T16:51:09Z) - Deep Reinforcement Learning for QoS-Constrained Resource Allocation in
Multiservice Networks [0.3324986723090368]
This article focuses on a non- optimization problem whose main aim is to maximize the spectral efficiency to satisfaction guarantees in multiservice wireless systems.
We propose a solution based on a Reinforcement Learning (RL) framework, where each agent makes its decisions to find a policy by interacting with the local environment.
We show a near optimal performance of the latter in terms of throughput and outage rate.
arXiv Detail & Related papers (2020-03-03T19:32:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.