Distributed Collaborative Inference System in Next-Generation Networks and Communication
- URL: http://arxiv.org/abs/2412.12102v1
- Date: Sat, 16 Nov 2024 10:48:12 GMT
- Title: Distributed Collaborative Inference System in Next-Generation Networks and Communication
- Authors: Chuan Zhang, Xixi Zheng, Xiaolong Tao, Chenfei Hu, Weiting Zhang, Liehuang Zhu,
- Abstract summary: High computational demands of generative artificial intelligence (GAI) present challenges for devices with limited resources.
We introduce a multi-level collaborative inference system designed for next-generation networks and communication.
Our system can reduce inference time by up to 17% without sacrificing the inference accuracy.
- Score: 12.372334028925618
- License:
- Abstract: With the rapid advancement of artificial intelligence, generative artificial intelligence (GAI) has taken a leading role in transforming data processing methods. However, the high computational demands of GAI present challenges for devices with limited resources. As we move towards the sixth generation of mobile networks (6G), the higher data rates and improved energy efficiency of 6G create a need for more efficient data processing in GAI. Traditional GAI, however, shows its limitations in meeting these demands. To address these challenges, we introduce a multi-level collaborative inference system designed for next-generation networks and communication. Our proposed system features a deployment strategy that assigns models of varying sizes to devices at different network layers. Then, we design a task offloading strategy to optimise both efficiency and latency. Furthermore, a modified early exit mechanism is implemented to enhance the inference process for single models. Experimental results demonstrate that our system effectively reduces inference latency while maintaining high-quality output. Specifically, compared to existing work, our system can reduce inference time by up to 17% without sacrificing the inference accuracy.
Related papers
- AI-in-the-Loop Sensing and Communication Joint Design for Edge Intelligence [65.29835430845893]
We propose a framework that enhances edge intelligence through AI-in-the-loop joint sensing and communication.
A key contribution of our work is establishing an explicit relationship between validation loss and the system's tunable parameters.
We show that our framework reduces communication energy consumption by up to 77 percent and sensing costs measured by the number of samples by up to 52 percent.
arXiv Detail & Related papers (2025-02-14T14:56:58Z) - Communication-Efficient Federated Learning by Quantized Variance Reduction for Heterogeneous Wireless Edge Networks [55.467288506826755]
Federated learning (FL) has been recognized as a viable solution for local-privacy-aware collaborative model training in wireless edge networks.
Most existing communication-efficient FL algorithms fail to reduce the significant inter-device variance.
We propose a novel communication-efficient FL algorithm, named FedQVR, which relies on a sophisticated variance-reduced scheme.
arXiv Detail & Related papers (2025-01-20T04:26:21Z) - Intelligent Task Offloading: Advanced MEC Task Offloading and Resource Management in 5G Networks [6.725133919174076]
5G technology enhances industries with high-speed, reliable, low-latency communication, revolutionizing mobile broadband and supporting massive IoT connectivity.
With the increasing complexity of applications on User Equipment, offloading resource-intensive tasks to robust servers is essential for improving latency and speed.
This paper introduces a novel methodology to efficiently allocate both communication resources among individual UEs.
It provides a robust and efficient solution to the challenges posed by the evolving landscape of 5G technology.
arXiv Detail & Related papers (2025-01-08T16:19:44Z) - USEFUSE: Utile Stride for Enhanced Performance in Fused Layer Architecture of Deep Neural Networks [0.6435156676256051]
This study presents the Sum-of-Products (SOP) units for convolution, which utilize low-latency left-to-right bit-serial arithmetic.
An effective mechanism detects and skips inefficient convolutions after ReLU layers, minimizing power consumption.
Two designs cater to varied demands: one focuses on minimal response time for mission-critical applications, and another focuses on resource-constrained devices with comparable latency.
arXiv Detail & Related papers (2024-12-18T11:04:58Z) - AI Flow at the Network Edge [58.31090055138711]
AI Flow is a framework that streamlines the inference process by jointly leveraging the heterogeneous resources available across devices, edge nodes, and cloud servers.
This article serves as a position paper for identifying the motivation, challenges, and principles of AI Flow.
arXiv Detail & Related papers (2024-11-19T12:51:17Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Predictive Handover Strategy in 6G and Beyond: A Deep and Transfer Learning Approach [11.44410301488549]
We propose a deep learning based algorithm for predicting the future serving cell.
Our framework complies with the O-RAN specifications and can be deployed in a Near-Real-Time RAN Intelligent Controller.
arXiv Detail & Related papers (2024-04-11T20:30:36Z) - Optimization Design for Federated Learning in Heterogeneous 6G Networks [27.273745760946962]
Federated learning (FL) is anticipated to be a key enabler for achieving ubiquitous AI in 6G networks.
There are several system and statistical heterogeneity challenges for effective and efficient FL implementation in 6G networks.
In this article, we investigate the optimization approaches that can effectively address the challenges.
arXiv Detail & Related papers (2023-03-15T02:18:21Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.