Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference
- URL: http://arxiv.org/abs/2405.17245v1
- Date: Mon, 27 May 2024 15:01:04 GMT
- Title: Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference
- Authors: Shengyuan Ye, Jiangsu Du, Liekang Zeng, Wenzhong Ou, Xiaowen Chu, Yutong Lu, Xu Chen,
- Abstract summary: Transformer-based models have unlocked a plethora of powerful intelligent applications at the edge.
Traditional deployment approaches offload the inference workloads to the remote cloud server.
We propose Galaxy, a collaborative edge AI system that breaks the resource walls across heterogeneous edge devices.
- Score: 19.60655813679882
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based models have unlocked a plethora of powerful intelligent applications at the edge, such as voice assistant in smart home. Traditional deployment approaches offload the inference workloads to the remote cloud server, which would induce substantial pressure on the backbone network as well as raise users' privacy concerns. To address that, in-situ inference has been recently recognized for edge intelligence, but it still confronts significant challenges stemming from the conflict between intensive workloads and limited on-device computing resources. In this paper, we leverage our observation that many edge environments usually comprise a rich set of accompanying trusted edge devices with idle resources and propose Galaxy, a collaborative edge AI system that breaks the resource walls across heterogeneous edge devices for efficient Transformer inference acceleration. Galaxy introduces a novel hybrid model parallelism to orchestrate collaborative inference, along with a heterogeneity-aware parallelism planning for fully exploiting the resource potential. Furthermore, Galaxy devises a tile-based fine-grained overlapping of communication and computation to mitigate the impact of tensor synchronizations on inference latency under bandwidth-constrained edge environments. Extensive evaluation based on prototype implementation demonstrates that Galaxy remarkably outperforms state-of-the-art approaches under various edge environment setups, achieving up to 2.5x end-to-end latency reduction.
Related papers
- Asteroid: Resource-Efficient Hybrid Pipeline Parallelism for Collaborative DNN Training on Heterogeneous Edge Devices [13.24437638911459]
On-device Deep Neural Network (DNN) training has been recognized as crucial for privacy-preserving machine learning at the edge.
We propose Asteroid, a distributed edge training system that breaks the resource walls across heterogeneous edge devices for efficient model training acceleration.
arXiv Detail & Related papers (2024-08-15T08:25:50Z) - Resource Management for Low-latency Cooperative Fine-tuning of Foundation Models at the Network Edge [35.40849522296486]
Large-scale foundation models (FoMos) can perform human-like intelligence.
FoMos need to be adapted to specialized downstream tasks through fine-tuning techniques.
We advocate multi-device cooperation within the device-edge cooperative fine-tuning paradigm.
arXiv Detail & Related papers (2024-07-13T12:47:14Z) - Failure-Resilient Distributed Inference with Model Compression over Heterogeneous Edge Devices [9.423705897088672]
We present RoCoIn, a robust cooperative inference mechanism for locally distributed execution of deep neural network-based inference tasks over heterogeneous edge devices.
It creates a set of independent and compact student models that are learned from a large model using knowledge distillation for distributed deployment.
In particular, the devices are strategically grouped to redundantly deploy and execute the same student model such that the inference process is resilient to any local failures.
arXiv Detail & Related papers (2024-06-20T10:43:53Z) - Distributed Threat Intelligence at the Edge Devices: A Large Language Model-Driven Approach [0.0]
Decentralized threat intelligence on edge devices represents a promising paradigm for enhancing cybersecurity on resource-constrained edge devices.
This approach involves the deployment of lightweight machine learning models directly onto edge devices to analyze local data streams, such as network traffic and system logs, in real-time.
Our proposed framework can improve edge computing security by providing better security in cyber threat detection and mitigation by isolating the edge devices from the network.
arXiv Detail & Related papers (2024-05-14T16:40:37Z) - Artificial Intelligence Empowered Multiple Access for Ultra Reliable and
Low Latency THz Wireless Networks [76.89730672544216]
Terahertz (THz) wireless networks are expected to catalyze the beyond fifth generation (B5G) era.
To satisfy the ultra-reliability and low-latency demands of several B5G applications, novel mobility management approaches are required.
This article presents a holistic MAC layer approach that enables intelligent user association and resource allocation, as well as flexible and adaptive mobility management.
arXiv Detail & Related papers (2022-08-17T03:00:24Z) - Task-Oriented Sensing, Computation, and Communication Integration for
Multi-Device Edge AI [108.08079323459822]
This paper studies a new multi-intelligent edge artificial-latency (AI) system, which jointly exploits the AI model split inference and integrated sensing and communication (ISAC)
We measure the inference accuracy by adopting an approximate but tractable metric, namely discriminant gain.
arXiv Detail & Related papers (2022-07-03T06:57:07Z) - Computational Intelligence and Deep Learning for Next-Generation
Edge-Enabled Industrial IoT [51.68933585002123]
We investigate how to deploy computational intelligence and deep learning (DL) in edge-enabled industrial IoT networks.
In this paper, we propose a novel multi-exit-based federated edge learning (ME-FEEL) framework.
In particular, the proposed ME-FEEL can achieve an accuracy gain up to 32.7% in the industrial IoT networks with the severely limited resources.
arXiv Detail & Related papers (2021-10-28T08:14:57Z) - Auto-Split: A General Framework of Collaborative Edge-Cloud AI [49.750972428032355]
This paper describes the techniques and engineering practice behind Auto-Split, an edge-cloud collaborative prototype of Huawei Cloud.
To the best of our knowledge, there is no existing industry product that provides the capability of Deep Neural Network (DNN) splitting.
arXiv Detail & Related papers (2021-08-30T08:03:29Z) - Towards AIOps in Edge Computing Environments [60.27785717687999]
This paper describes the system design of an AIOps platform which is applicable in heterogeneous, distributed environments.
It is feasible to collect metrics with a high frequency and simultaneously run specific anomaly detection algorithms directly on edge devices.
arXiv Detail & Related papers (2021-02-12T09:33:00Z) - CoEdge: Cooperative DNN Inference with Adaptive Workload Partitioning
over Heterogeneous Edge Devices [39.09319776243573]
CoEdge is a distributed Deep Neural Network (DNN) computing system that orchestrates cooperative inference over heterogeneous edge devices.
CoEdge saves energy with close inference latency, achieving up to 25.5%66.9% energy reduction for four widely-adopted CNN models.
arXiv Detail & Related papers (2020-12-06T13:15:52Z) - Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned
Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network.
We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.