Adaptive Orchestration for Inference of Large Foundation Models at the Edge
- URL: http://arxiv.org/abs/2504.03668v1
- Date: Wed, 19 Mar 2025 15:35:56 GMT
- Title: Adaptive Orchestration for Inference of Large Foundation Models at the Edge
- Authors: Fernando Koch, Aladin Djuhera, Alecio Binotto,
- Abstract summary: Large Foundation Models (LFMs) promise to unlock new capabilities for next-generation Edge AI applications.<n> performing inference with LFMs in resource-constrained and heterogeneous edge environments presents significant challenges for workload orchestration.<n>We propose a novel adaptive orchestration method and system tailored specifically for managing distributed inference workloads.
- Score: 46.1232919707345
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Foundation Models (LFMs), including multi-modal and generative AI models, promise to unlock new capabilities for next-generation Edge AI applications. However, performing inference with LFMs in resource-constrained and heterogeneous edge environments presents significant challenges for workload orchestration. We propose a novel adaptive orchestration method and system tailored specifically for managing distributed inference workloads across multi-access edge computing (MEC) infrastructures. Our approach enhances traditional workload orchestration by introducing dynamic methods including: (1) adaptive workload distribution that selects optimal, inter-connected edge nodes based on runtime capacity profiling; (2) dynamic redistribution of LFM partitions as operational conditions evolve, and; (3) real-time reconfiguration (e.g., re-splitting) of LFM layers to balance performance and privacy requirements. Our proposed framework introduces an architecture for adaptive split inference, enabling real-time, QoS-aware management of inference workloads. We present a reference architecture, detail operational mechanisms, and demonstrate its application through various use cases in real-world scenarios.
Related papers
- UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines [64.84631333071728]
We introduce bfUnistage, a unified Transformer-based framework fortemporal modeling.
Our work demonstrates that a task-specific vision-text can build a generalizable model fortemporal learning.
We also introduce a temporal module to incorporate temporal dynamics explicitly.
arXiv Detail & Related papers (2025-03-26T17:33:23Z) - DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers [86.5541501589166]
DiffMoE is a batch-level global token pool that enables experts to access global token distributions during training.
It achieves state-of-the-art performance among diffusion models on ImageNet benchmark.
The effectiveness of our approach extends beyond class-conditional generation to more challenging tasks such as text-to-image generation.
arXiv Detail & Related papers (2025-03-18T17:57:07Z) - Multi-Objective Bayesian Optimization for Networked Black-Box Systems: A Path to Greener Profits and Smarter Designs [0.0]
MOBONS is a novel Bayesian optimization-inspired algorithm that can efficiently optimize general function networks.
We demonstrate the effectiveness of MOBONS through two case studies, including one related to sustainable process design.
arXiv Detail & Related papers (2025-02-19T21:49:05Z) - ModServe: Scalable and Resource-Efficient Large Multimodal Model Serving [19.388562622309838]
Large multimodal models (LMMs) demonstrate impressive capabilities in understanding images, videos, and audio beyond text.<n>We present the first comprehensive systems analysis of two prominent LMM architectures, decoder-only and cross-attention, across six representative open-source models.<n>We propose ModServe, a modular LMM serving system that decouples stages for independent optimization and adaptive scaling.
arXiv Detail & Related papers (2025-02-02T22:10:40Z) - Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design [59.00758127310582]
We propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models.
Our approach employs activation sparsity to extract experts.
Read-ME outperforms other popular open-source dense models of similar scales.
arXiv Detail & Related papers (2024-10-24T19:48:51Z) - Regularized Conditional Diffusion Model for Multi-Task Preference Alignment [43.86042557447689]
Sequential decision-making is desired to align with human intents and exhibit versatility across various tasks.
Previous methods formulate it as a conditional generation process, utilizing return-conditioned diffusion models to directly model trajectory distributions.
In this work, we adopt multi-task preferences as a unified condition for both single- and multi-task decision-making.
arXiv Detail & Related papers (2024-04-07T11:20:32Z) - A Bayesian Framework of Deep Reinforcement Learning for Joint O-RAN/MEC
Orchestration [12.914011030970814]
Multi-access Edge Computing (MEC) can be implemented together with Open Radio Access Network (O-RAN) over commodity platforms to offer low-cost deployment.
In this paper, a joint O-RAN/MEC orchestration using a Bayesian deep reinforcement learning (RL)-based framework is proposed.
arXiv Detail & Related papers (2023-12-26T18:04:49Z) - Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference [47.460898983429374]
We introduce an ensemble Kalman filter (EnKF) into the non-mean-field (NMF) variational inference framework to approximate the posterior distribution of the latent states.
This novel marriage between EnKF and GPSSM not only eliminates the need for extensive parameterization in learning variational distributions, but also enables an interpretable, closed-form approximation of the evidence lower bound (ELBO)
We demonstrate that the resulting EnKF-aided online algorithm embodies a principled objective function by ensuring data-fitting accuracy while incorporating model regularizations to mitigate overfitting.
arXiv Detail & Related papers (2023-12-10T15:22:30Z) - Orchestration of Emulator Assisted Mobile Edge Tuning for AI Foundation
Models: A Multi-Agent Deep Reinforcement Learning Approach [10.47302625959368]
We present a groundbreaking paradigm integrating Mobile Edge Computing with foundation models, specifically designed to enhance local task performance on user equipment (UE)
Central to our approach is the innovative Emulator-Adapter architecture, segmenting the foundation model into two cohesive modules.
We introduce an advanced resource allocation mechanism that is fine-tuned to the needs of the Emulator-Adapter structure in decentralized settings.
arXiv Detail & Related papers (2023-10-26T15:47:51Z) - Slimmable Domain Adaptation [112.19652651687402]
We introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank.
Our framework surpasses other competing approaches by a very large margin on multiple benchmarks.
arXiv Detail & Related papers (2022-06-14T06:28:04Z) - Edge-assisted Democratized Learning Towards Federated Analytics [67.44078999945722]
We show the hierarchical learning structure of the proposed edge-assisted democratized learning mechanism, namely Edge-DemLearn.
We also validate Edge-DemLearn as a flexible model training mechanism to build a distributed control and aggregation methodology in regions.
arXiv Detail & Related papers (2020-12-01T11:46:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.