Related papers: Distributed Collaborative Inference System in Next-Generation Networks and Communication

Distributed Collaborative Inference System in Next-Generation Networks and Communication

URL: http://arxiv.org/abs/2412.12102v1
Date: Sat, 16 Nov 2024 10:48:12 GMT
Title: Distributed Collaborative Inference System in Next-Generation Networks and Communication
Authors: Chuan Zhang, Xixi Zheng, Xiaolong Tao, Chenfei Hu, Weiting Zhang, Liehuang Zhu,
Abstract summary: High computational demands of generative artificial intelligence (GAI) present challenges for devices with limited resources.<n>We introduce a multi-level collaborative inference system designed for next-generation networks and communication.<n>Our system can reduce inference time by up to 17% without sacrificing the inference accuracy.
Score: 12.372334028925618
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the rapid advancement of artificial intelligence, generative artificial intelligence (GAI) has taken a leading role in transforming data processing methods. However, the high computational demands of GAI present challenges for devices with limited resources. As we move towards the sixth generation of mobile networks (6G), the higher data rates and improved energy efficiency of 6G create a need for more efficient data processing in GAI. Traditional GAI, however, shows its limitations in meeting these demands. To address these challenges, we introduce a multi-level collaborative inference system designed for next-generation networks and communication. Our proposed system features a deployment strategy that assigns models of varying sizes to devices at different network layers. Then, we design a task offloading strategy to optimise both efficiency and latency. Furthermore, a modified early exit mechanism is implemented to enhance the inference process for single models. Experimental results demonstrate that our system effectively reduces inference latency while maintaining high-quality output. Specifically, compared to existing work, our system can reduce inference time by up to 17% without sacrificing the inference accuracy.

Related papers

Communication Optimization for Decentralized Learning atop Bandwidth-limited Edge Networks [4.880664732766839]
Decentralized federated learning (DFL) is a promising machine learning paradigm for bringing artificial intelligence (AI) capabilities to the network edge. Running DFL on top of edge networks, however, faces severe performance challenges due to the extensive parameter exchanges between agents. We jointly design the communication scheme for the overlay network formed by the agents and the mixing matrix that controls the communication demands between the agents. Our evaluations show that the proposed algorithm can reduce the total training time by over $80%$ compared to the baseline.
arXiv Detail & Related papers (2025-04-16T15:56:57Z)
Graph Based Deep Reinforcement Learning Aided by Transformers for Multi-Agent Cooperation [2.8169258551959544]
We propose a novel framework that integrates Graph Neural Networks (GNNs), Deep Reinforcement Learning (DRL), and transformer-based mechanisms for enhanced multi-agent coordination and collective task execution. Our approach leverages GNNs to model agent-agent and agent-goal interactions through adaptive graph construction, enabling efficient information aggregation and decision-making under constrained communication.
arXiv Detail & Related papers (2025-04-11T01:46:18Z)
AI-in-the-Loop Sensing and Communication Joint Design for Edge Intelligence [65.29835430845893]
We propose a framework that enhances edge intelligence through AI-in-the-loop joint sensing and communication. A key contribution of our work is establishing an explicit relationship between validation loss and the system's tunable parameters. We show that our framework reduces communication energy consumption by up to 77 percent and sensing costs measured by the number of samples by up to 52 percent.
arXiv Detail & Related papers (2025-02-14T14:56:58Z)
Communication-Efficient Federated Learning by Quantized Variance Reduction for Heterogeneous Wireless Edge Networks [55.467288506826755]
Federated learning (FL) has been recognized as a viable solution for local-privacy-aware collaborative model training in wireless edge networks. Most existing communication-efficient FL algorithms fail to reduce the significant inter-device variance. We propose a novel communication-efficient FL algorithm, named FedQVR, which relies on a sophisticated variance-reduced scheme.
arXiv Detail & Related papers (2025-01-20T04:26:21Z)
Intelligent Task Offloading: Advanced MEC Task Offloading and Resource Management in 5G Networks [6.725133919174076]
5G technology enhances industries with high-speed, reliable, low-latency communication, revolutionizing mobile broadband and supporting massive IoT connectivity. With the increasing complexity of applications on User Equipment, offloading resource-intensive tasks to robust servers is essential for improving latency and speed. This paper introduces a novel methodology to efficiently allocate both communication resources among individual UEs. It provides a robust and efficient solution to the challenges posed by the evolving landscape of 5G technology.
arXiv Detail & Related papers (2025-01-08T16:19:44Z)
USEFUSE: Utile Stride for Enhanced Performance in Fused Layer Architecture of Deep Neural Networks [0.6435156676256051]
This study presents the Sum-of-Products (SOP) units for convolution, which utilize low-latency left-to-right bit-serial arithmetic.<n>An effective mechanism detects and skips inefficient convolutions after ReLU layers, minimizing power consumption.<n>Two designs cater to varied demands: one focuses on minimal response time for mission-critical applications, and another focuses on resource-constrained devices with comparable latency.
arXiv Detail & Related papers (2024-12-18T11:04:58Z)
AI Flow at the Network Edge [58.31090055138711]
AI Flow is a framework that streamlines the inference process by jointly leveraging the heterogeneous resources available across devices, edge nodes, and cloud servers. This article serves as a position paper for identifying the motivation, challenges, and principles of AI Flow.
arXiv Detail & Related papers (2024-11-19T12:51:17Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
DRL Optimization Trajectory Generation via Wireless Network Intent-Guided Diffusion Models for Optimizing Resource Allocation [58.62766376631344]
We propose a customized wireless network intent (WNI-G) model to address different state variations of wireless communication networks. Extensive simulation achieves greater stability in spectral efficiency and variations of traditional DRL models in dynamic communication systems.
arXiv Detail & Related papers (2024-10-18T14:04:38Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
Predictive Handover Strategy in 6G and Beyond: A Deep and Transfer Learning Approach [11.44410301488549]
We propose a deep learning based algorithm for predicting the future serving cell. Our framework complies with the O-RAN specifications and can be deployed in a Near-Real-Time RAN Intelligent Controller.
arXiv Detail & Related papers (2024-04-11T20:30:36Z)
Multiagent Reinforcement Learning with an Attention Mechanism for Improving Energy Efficiency in LoRa Networks [52.96907334080273]
As the network scale increases, the energy efficiency of LoRa networks decreases sharply due to severe packet collisions. We propose a transmission parameter allocation algorithm based on multiagent reinforcement learning (MALoRa) Simulation results demonstrate that MALoRa significantly improves the system EE compared with baseline algorithms.
arXiv Detail & Related papers (2023-09-16T11:37:23Z)
Optimization Design for Federated Learning in Heterogeneous 6G Networks [27.273745760946962]
Federated learning (FL) is anticipated to be a key enabler for achieving ubiquitous AI in 6G networks. There are several system and statistical heterogeneity challenges for effective and efficient FL implementation in 6G networks. In this article, we investigate the optimization approaches that can effectively address the challenges.
arXiv Detail & Related papers (2023-03-15T02:18:21Z)
Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks. specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples. We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z)
Leveraging AI and Intelligent Reflecting Surface for Energy-Efficient Communication in 6G IoT [14.027983498089084]
We propose an artificial intelligence (AI) and intelligent reflecting surface (IRS) empowered energy-efficiency communication system for 6G IoT. First, we design a smart and efficient communication architecture including the IRS-aided data transmission and the AI-driven network resource management mechanisms. Third, a deep reinforcement learning (DRL) empowered network resource control and allocation scheme is proposed to solve the formulated optimization model.
arXiv Detail & Related papers (2020-12-29T11:56:28Z)
HAPI: Hardware-Aware Progressive Inference [18.214367595727037]
Convolutional neural networks (CNNs) have recently become the state-of-the-art in a diversity of AI tasks. Despite their popularity, CNN inference still comes at a high computational cost. This work presents HAPI, a novel methodology for generating high-performance early-exit networks.
arXiv Detail & Related papers (2020-08-10T09:55:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.