High-Fidelity Network Management for Federated AI-as-a-Service: Cross-Domain Orchestration
- URL: http://arxiv.org/abs/2602.15281v2
- Date: Wed, 18 Feb 2026 08:17:22 GMT
- Title: High-Fidelity Network Management for Federated AI-as-a-Service: Cross-Domain Orchestration
- Authors: Mohaned Chraiti, Ozgur Ercetin, Merve Saimler,
- Abstract summary: This paper introduces an assurance-oriented AI management plane based on Tail-Risk Envelopes (TREs)<n>TREs are signed, composable per-domain descriptors that combine deterministic guardrails with rate-latency-impairment models.<n>We show that tenant-level reservations prevent bursty traffic from inflating tail latency under TRE contracts.
- Score: 0.12234742322758417
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To support the emergence of AI-as-a-Service (AIaaS), communication service providers (CSPs) are on the verge of a radical transformation-from pure connectivity providers to AIaaS a managed network service (control-and-orchestration plane that exposes AI models). In this model, the CSP is responsible not only for transport/communications, but also for intent-to-model resolution and joint network-compute orchestration, i.e., reliable and timely end-to-end delivery. The resulting end-to-end AIaaS service thus becomes governed by communications impairments (delay, loss) and inference impairments (latency, error). A central open problem is an operational AIaaS control-and-orchestration framework that enforces high fidelity, particularly under multi-domain federation. This paper introduces an assurance-oriented AIaaS management plane based on Tail-Risk Envelopes (TREs): signed, composable per-domain descriptors that combine deterministic guardrails with stochastic rate-latency-impairment models. Using stochastic network calculus, we derive bounds on end-to-end delay violation probabilities across tandem domains and obtain an optimization-ready risk-budget decomposition. We show that tenant-level reservations prevent bursty traffic from inflating tail latency under TRE contracts. An auditing layer then uses runtime telemetry to estimate extreme-percentile performance, quantify uncertainty, and attribute tail-risk to each domain for accountability. Packet-level Monte-Carlo simulations demonstrate improved p99.9 compliance under overload via admission control and robust tenant isolation under correlated burstiness.
Related papers
- Token Management in Multi-Tenant AI Inference Platforms [0.0]
Multi-tenant AI inference platforms must balance resource utilization against service-level guarantees under variable demand.<n>We introduce emphtoken pools, a control-plane abstraction that represents capacity as explicit entitlements expressed in inference-native units.
arXiv Detail & Related papers (2026-02-27T22:44:09Z) - Blockchain-Enabled Routing for Zero-Trust Low-Altitude Intelligent Networks [77.17664010626726]
We focus on the routing with multiple UAV clusters in low-altitude intelligent networks (LAINs)<n>To minimize the damage caused by potential threats, we present the zero-trust architecture with the software-defined perimeter and blockchain techniques.<n>We show that the proposed framework reduces the average E2E delay by 59% and improves the TSR by 29% on average compared to benchmarks.
arXiv Detail & Related papers (2026-02-27T04:30:35Z) - AI-Paging: Lease-Based Execution Anchoring for Network-Exposed AI-as-a-Service [0.13750624267664155]
6G service providers are envisioned to play a crucial role in exposing AI in a setting where users submit only an intent.<n>We prototype AI-Paging using existing control- and user-plane mechanisms.
arXiv Detail & Related papers (2026-02-17T01:11:26Z) - Secure and Energy-Efficient Wireless Agentic AI Networks [12.588984049305866]
secure wireless agentic AI network comprises one supervisor AI agent and multiple other AI agents.<n>Agents dynamically assign other AI agents to participate in cooperative reasoning.<n>Unselected AI agents act as friendly jammers to degrade the eavesdropper's interception performance.
arXiv Detail & Related papers (2026-02-16T21:42:33Z) - Efficient Mixture-of-Agents Serving via Tree-Structured Routing, Adaptive Pruning, and Dependency-Aware Prefill-Decode Overlap [15.352230356342366]
Mixture-of-Agents (MoA) inference can suffer from dense inter-agent communication and low hardware utilization.<n>We present a serving design that targets these bottlenecks through an algorithm-system co-design.
arXiv Detail & Related papers (2025-12-19T23:06:58Z) - QoS-Aware Hierarchical Reinforcement Learning for Joint Link Selection and Trajectory Optimization in SAGIN-Supported UAV Mobility Management [52.15690855486153]
A space-air-ground integrated network (SAGIN) has emerged as an essential architecture for enabling ubiquitous UAV connectivity.<n>This paper formulates UAV mobility management in SAGIN as a constrained multiobjective joint optimization problem.
arXiv Detail & Related papers (2025-12-17T06:22:46Z) - Agentic DDQN-Based Scheduling for Licensed and Unlicensed Band Allocation in Sidelink Networks [37.89031907489481]
We present an agentic double deep Q-network (DDQN) scheduler for licensed/unlicensed band allocation in New Radio (NR) sidelink (SL) networks.<n>A capacity-aware, quality of service (QoS)-constrained reward aligns the agent with goal-oriented scheduling rather than static thresholding.
arXiv Detail & Related papers (2025-09-08T14:58:12Z) - The Larger the Merrier? Efficient Large AI Model Inference in Wireless Edge Networks [56.37880529653111]
The demand for large computation model (LAIM) services is driving a paradigm shift from traditional cloud-based inference to edge-based inference for low-latency, privacy-preserving applications.<n>In this paper, we investigate the LAIM-inference scheme, where a pre-trained LAIM is pruned and partitioned into on-device and on-server sub-models for deployment.
arXiv Detail & Related papers (2025-05-14T08:18:55Z) - DASA: Delay-Adaptive Multi-Agent Stochastic Approximation [64.32538247395627]
We consider a setting in which $N$ agents aim to speedup a common Approximation problem by acting in parallel and communicating with a central server.
To mitigate the effect of delays and stragglers, we propose textttDASA, a Delay-Adaptive algorithm for multi-agent Approximation.
arXiv Detail & Related papers (2024-03-25T22:49:56Z) - Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned
Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network.
We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z) - Multi-Armed Bandit Based Client Scheduling for Federated Learning [91.91224642616882]
federated learning (FL) features ubiquitous properties such as reduction of communication overhead and preserving data privacy.
In each communication round of FL, the clients update local models based on their own data and upload their local updates via wireless channels.
This work provides a multi-armed bandit-based framework for online client scheduling (CS) in FL without knowing wireless channel state information and statistical characteristics of clients.
arXiv Detail & Related papers (2020-07-05T12:32:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.