Multi-Agentic AI for Fairness-Aware and Accelerated Multi-modal Large Model Inference in Real-world Mobile Edge Networks
- URL: http://arxiv.org/abs/2602.07215v1
- Date: Fri, 06 Feb 2026 21:52:49 GMT
- Title: Multi-Agentic AI for Fairness-Aware and Accelerated Multi-modal Large Model Inference in Real-world Mobile Edge Networks
- Authors: Haiyuan Li, Hari Madhukumar, Shuangyi Yan, Yulei Wu, Dimitra Simeonidou,
- Abstract summary: We propose a Multi-Agentic AI framework for latency- and fairness-aware multi-modal LM inference in mobile edge networks.<n>Our solution includes a long-term planning agent, a short-term prompt scheduling agent, and multiple on-node LM deployment agents.<n> Experiments demonstrate that our solution reduces average latency by over 80% and improves fairness (Normalized Jain index) to 0.90 compared to other baselines.
- Score: 4.018860391090846
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative AI (GenAI) has transformed applications in natural language processing and content creation, yet centralized inference remains hindered by high latency, limited customizability, and privacy concerns. Deploying large models (LMs) in mobile edge networks emerges as a promising solution. However, it also poses new challenges, including heterogeneous multi-modal LMs with diverse resource demands and inference speeds, varied prompt/output modalities that complicate orchestration, and resource-limited infrastructure ill-suited for concurrent LM execution. In response, we propose a Multi-Agentic AI framework for latency- and fairness-aware multi-modal LM inference in mobile edge networks. Our solution includes a long-term planning agent, a short-term prompt scheduling agent, and multiple on-node LM deployment agents, all powered by foundation language models. These agents cooperatively optimize prompt routing and LM deployment through natural language reasoning over runtime telemetry and historical experience. To evaluate its performance, we further develop a city-wide testbed that supports network monitoring, containerized LM deployment, intra-server resource management, and inter-server communications. Experiments demonstrate that our solution reduces average latency by over 80% and improves fairness (Normalized Jain index) to 0.90 compared to other baselines. Moreover, our solution adapts quickly without fine-tuning, offering a generalizable solution for optimizing GenAI services in edge environments.
Related papers
- Divide, Harmonize, Then Conquer It: Shooting Multi-Commodity Flow Problems with Multimodal Language Models [23.382563394142082]
We present Pram, the first ML-based method that leverages the reasoning power of multimodal language models (MLMs) for addressing the trade-off dilemma.<n>Pram is objective-agnostic and seamlessly integrates with mainstream allocation systems, providing a practical and scalable solution for future networks.
arXiv Detail & Related papers (2026-02-11T17:24:49Z) - ComAgent: Multi-LLM based Agentic AI Empowered Intelligent Wireless Networks [62.031889234230725]
6G networks rely on complex cross-layer optimization.<n> manually translating high-level intents into mathematical formulations remains a bottleneck.<n>We present ComAgent, a multi-LLM agentic AI framework.
arXiv Detail & Related papers (2026-01-27T13:43:59Z) - NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching [64.10695425442164]
We introduce NExT-OMNI, an open-source omnimodal foundation model that achieves unified modeling through discrete flow paradigms.<n>Trained on large-scale interleaved text, image, video, and audio data, NExT-OMNI delivers competitive performance on multimodal generation and understanding benchmarks.<n>To advance further research, we release training details, data protocols, and open-source both the code and model checkpoints.
arXiv Detail & Related papers (2025-10-15T16:25:18Z) - Multi-Agent Tool-Integrated Policy Optimization [67.12841355267678]
Large language models (LLMs) increasingly rely on multi-turn tool-integrated planning for knowledge-intensive and complex reasoning tasks.<n>Existing implementations typically rely on a single agent, but they suffer from limited context length and noisy tool responses.<n>No existing methods support effective reinforcement learning post-training of tool-integrated multi-agent frameworks.
arXiv Detail & Related papers (2025-10-06T10:44:04Z) - CollaPipe: Adaptive Segment-Optimized Pipeline Parallelism for Collaborative LLM Training in Heterogeneous Edge Networks [57.95170323315603]
We introduce CollaPipe, a distributed learning framework that integrates collaborative pipeline parallelism with federated aggregation to support self-evolving networks.<n>In CollaPipe, the encoder part is adaptively partitioned into variable-sized segments and deployed across mobile devices for pipeline-parallel training, while the decoder is deployed on edge servers to handle generative tasks.<n>To enhance training efficiency, we formulate a joint optimization problem that adaptively allocates model segments, micro-batches, bandwidth, and transmission power.
arXiv Detail & Related papers (2025-09-24T07:54:01Z) - Adaptive AI Agent Placement and Migration in Edge Intelligence Systems [14.789027376038115]
We propose a novel framework for AI agent placement and migration in edge intelligence systems.<n>It autonomously places agents to optimize resource utilization and enables lightweight agent migration by transferring only essential state.
arXiv Detail & Related papers (2025-08-05T11:47:46Z) - Symbiotic Agents: A Novel Paradigm for Trustworthy AGI-driven Networks [1.5684305805304426]
Large Language Model (LLM)-based autonomous agents are expected to play a vital role in the evolution of 6G networks.<n>We introduce a novel agentic paradigm that combines LLMs real-time optimization algorithms towards Trustworthy AI.<n>We propose an end-to-end architecture for AGI networks and evaluate it on a 5G testbed capturing channel fluctuations from moving vehicles.
arXiv Detail & Related papers (2025-07-23T17:01:23Z) - Large AI Model Empowered Multimodal Semantic Communications [48.73159237649128]
We propose a Large AI Model-based Multimodal SC (LAMMSC) framework.
We first present the Conditional-based Multimodal Alignment (MMA) that enables the transformation between multimodal and unimodal data.
Then, a personalized LLM-based Knowledge Base (LKB) is proposed, which allows users to perform personalized semantic extraction or recovery.
Finally, we apply the Generative adversarial network-based channel Estimation (CGE) for estimating the wireless channel state information.
arXiv Detail & Related papers (2023-09-03T19:24:34Z) - Low-Latency Federated Learning over Wireless Channels with Differential
Privacy [142.5983499872664]
In federated learning (FL), model training is distributed over clients and local models are aggregated by a central server.
In this paper, we aim to minimize FL training delay over wireless channels, constrained by overall training performance as well as each client's differential privacy (DP) requirement.
arXiv Detail & Related papers (2021-06-20T13:51:18Z) - Joint Multi-User DNN Partitioning and Computational Resource Allocation
for Collaborative Edge Intelligence [21.55340197267767]
Mobile Edge Computing (MEC) has emerged as a promising supporting architecture providing a variety of resources to the network edge.
With the assistance of edge servers, user equipments (UEs) are able to run deep neural network (DNN) based AI applications.
We propose an algorithm called Iterative Alternating Optimization (IAO) that can achieve the optimal solution in time.
arXiv Detail & Related papers (2020-07-15T09:40:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.