Efficient Multiuser AI Downloading via Reusable Knowledge Broadcasting
- URL: http://arxiv.org/abs/2307.15316v1
- Date: Fri, 28 Jul 2023 05:30:19 GMT
- Title: Efficient Multiuser AI Downloading via Reusable Knowledge Broadcasting
- Authors: Hai Wu, Qunsong Zeng, and Kaibin Huang
- Abstract summary: In-situ model downloading has emerged as an important use case to enable real-time adaptive artificial intelligence on edge devices.
We propose the framework of model broadcasting and assembling (MBA) to overcome the bottleneck.
Extensive experiments demonstrate the substantial reduction in downloading latency achieved by the proposed MBA compared to traditional model downloading.
- Score: 36.95383755941367
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For the 6G mobile networks, in-situ model downloading has emerged as an
important use case to enable real-time adaptive artificial intelligence on edge
devices. However, the simultaneous downloading of diverse and high-dimensional
models to multiple devices over wireless links presents a significant
communication bottleneck. To overcome the bottleneck, we propose the framework
of model broadcasting and assembling (MBA), which represents the first attempt
on leveraging reusable knowledge, referring to shared parameters among tasks,
to enable parameter broadcasting to reduce communication overhead. The MBA
framework comprises two key components. The first, the MBA protocol, defines
the system operations including parameter selection from a model library, power
control for broadcasting, and model assembling at devices. The second component
is the joint design of parameter-selection-and-power-control (PS-PC), which
provides guarantees on devices' model performance and minimizes the downloading
latency. The corresponding optimization problem is simplified by decomposition
into the sequential PS and PC sub-problems without compromising its optimality.
The PS sub-problem is solved efficiently by designing two efficient algorithms.
On one hand, the low-complexity algorithm of greedy parameter selection
features the construction of candidate model sets and a selection metric, both
of which are designed under the criterion of maximum reusable knowledge among
tasks. On the other hand, the optimal tree-search algorithm gains its
efficiency via the proposed construction of a compact binary tree pruned using
model architecture constraints and an intelligent branch-and-bound search.
Given optimal PS, the optimal PC policy is derived in closed form. Extensive
experiments demonstrate the substantial reduction in downloading latency
achieved by the proposed MBA compared to traditional model downloading.
Related papers
- Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition [95.54406667705999]
Pangu Embedded is an efficient Large Language Model (LLM) reasoner developed on Ascend Neural Processing Units (NPUs)<n>It addresses the significant computational costs and inference latency challenges prevalent in existing reasoning-optimized LLMs.<n>It delivers rapid responses and state-of-the-art reasoning quality within a single, unified model architecture.
arXiv Detail & Related papers (2025-05-28T14:03:02Z) - PartialLoading: User Scheduling and Bandwidth Allocation for Parameter-sharing Edge Inference [32.58445942857626]
We develop a parameter-sharing AI model loading framework for multi-user edge inference.
We exploit shared parameter blocks across models to maximize task throughput.
We show that the proposed framework significantly improves task throughput under deadline compared with user scheduling.
arXiv Detail & Related papers (2025-03-29T05:58:07Z) - Joint Transmit and Pinching Beamforming for Pinching Antenna Systems (PASS): Optimization-Based or Learning-Based? [89.05848771674773]
A novel antenna system ()-enabled downlink multi-user multiple-input single-output (MISO) framework is proposed.
It consists of multiple waveguides, which equip numerous low-cost antennas, named (PAs)
The positions of PAs can be reconfigured to both spanning large-scale path and space.
arXiv Detail & Related papers (2025-02-12T18:54:10Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Communication-Computation Efficient Device-Edge Co-Inference via AutoML [4.06604174802643]
Device-edge co-inference partitions a deep neural network between a resource-constrained mobile device and an edge server.
On-device model sparsity level and intermediate feature compression ratio have direct impacts on workload and communication overhead.
We propose a novel automated machine learning (AutoML) framework based on deep reinforcement learning (DRL)
arXiv Detail & Related papers (2021-08-30T06:36:30Z) - Edge Federated Learning Via Unit-Modulus Over-The-Air Computation
(Extended Version) [64.76619508293966]
This paper proposes a unit-modulus over-the-air computation (UM-AirComp) framework to facilitate efficient edge federated learning.
It uploads simultaneously local model parameters and updates global model parameters via analog beamforming.
We demonstrate the implementation of UM-AirComp in a vehicle-to-everything autonomous driving simulation platform.
arXiv Detail & Related papers (2021-01-28T15:10:22Z) - Federated Learning via Intelligent Reflecting Surface [30.935389187215474]
Over-the-air computation algorithm (AirComp) based learning (FL) is capable of achieving fast model aggregation by exploiting the waveform superposition property of multiple access channels.
In this paper, we propose a two-step optimization framework to achieve fast yet reliable model aggregation for AirComp-based FL.
Simulation results will demonstrate that our proposed framework and the deployment of an IRS can achieve a lower training loss and higher FL prediction accuracy than the baseline algorithms.
arXiv Detail & Related papers (2020-11-10T11:29:57Z) - Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned
Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network.
We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z) - Joint Parameter-and-Bandwidth Allocation for Improving the Efficiency of
Partitioned Edge Learning [73.82875010696849]
Machine learning algorithms are deployed at the network edge for training artificial intelligence (AI) models.
This paper focuses on the novel joint design of parameter (computation load) allocation and bandwidth allocation.
arXiv Detail & Related papers (2020-03-10T05:52:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.