Related papers: KAITIAN: A Unified Communication Framework for Enabling Efficient Collaboration Across Heterogeneous Accelerators in Embodied AI Systems

KAITIAN: A Unified Communication Framework for Enabling Efficient Collaboration Across Heterogeneous Accelerators in Embodied AI Systems

URL: http://arxiv.org/abs/2505.10183v1
Date: Thu, 15 May 2025 11:29:43 GMT
Title: KAITIAN: A Unified Communication Framework for Enabling Efficient Collaboration Across Heterogeneous Accelerators in Embodied AI Systems
Authors: Jieke Lin, Wanyu Wang, Longxiang Yin, Yinhe Han,
Abstract summary: KAITIAN is a novel distributed communication framework for AI workloads.<n>It integrates vendor-optimized communication libraries for intra-group efficiency with general-purpose communication protocols for inter-group interoperability.<n>It can accelerate training time by up to 42% compared to baseline homogeneous systems.
Score: 5.241889216655924
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Embodied Artificial Intelligence (AI) systems, such as autonomous robots and intelligent vehicles, are increasingly reliant on diverse heterogeneous accelerators (e.g., GPGPUs, NPUs, FPGAs) to meet stringent real-time processing and energy-efficiency demands. However, the proliferation of vendor-specific proprietary communication libraries creates significant interoperability barriers, hindering seamless collaboration between different accelerator types and leading to suboptimal resource utilization and performance bottlenecks in distributed AI workloads. This paper introduces KAITIAN, a novel distributed communication framework designed to bridge this gap. KAITIAN provides a unified abstraction layer that intelligently integrates vendor-optimized communication libraries for intra-group efficiency with general-purpose communication protocols for inter-group interoperability. Crucially, it incorporates a load-adaptive scheduling mechanism that dynamically balances computational tasks across heterogeneous devices based on their real-time performance characteristics. Implemented as an extension to PyTorch and rigorously evaluated on a testbed featuring NVIDIA GPUs and Cambricon MLUs, KAITIAN demonstrates significant improvements in resource utilization and scalability for distributed training tasks. Experimental results show that KAITIAN can accelerate training time by up to 42% compared to baseline homogeneous systems, while incurring minimal communication overhead (2.8--4.3%) and maintaining model accuracy. KAITIAN paves the way for more flexible and powerful heterogeneous computing in complex embodied AI applications.

Related papers

Efficient and Scalable Agentic AI with Heterogeneous Systems [1.8921715645847679]
AI agents are emerging as a dominant workload in a wide range of applications, promising to be the vehicle that delivers the promised benefits of AI to enterprises and consumers.<n>To scale AI agent usage, we need efficient and scalable deployment and agent-serving infrastructure.<n>We present a system design for dynamic orchestration of AI agent workloads on heterogeneous compute infrastructure.
arXiv Detail & Related papers (2025-07-25T19:02:42Z)
Bi-LSTM based Multi-Agent DRL with Computation-aware Pruning for Agent Twins Migration in Vehicular Embodied AI Networks [20.574619097682923]
In intelligent transportation, the combination of large language models and embodied Artificial Intelligence (AI) spawns the Vehicular Embodied AI Network (VEANs)<n>In VEANs, Autonomous Vehicles (AVs) are typical agents whose local advanced AI applications are defined as vehicular embodied AI agents.<n>Due to latency and resource constraints, the local AI applications and services running on vehicular embodied AI agents need to be migrated.
arXiv Detail & Related papers (2025-05-09T18:52:26Z)
Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey [59.52058740470727]
Edge-cloud collaborative computing (ECCC) has emerged as a pivotal paradigm for addressing the computational demands of modern intelligent applications.<n>Recent advancements in AI, particularly deep learning and large language models (LLMs), have dramatically enhanced the capabilities of these distributed systems.<n>This survey provides a structured tutorial on fundamental architectures, enabling technologies, and emerging applications.
arXiv Detail & Related papers (2025-05-03T13:55:38Z)
Transforming the Hybrid Cloud for Emerging AI Workloads [81.15269563290326]
This white paper envisions transforming hybrid cloud systems to meet the growing complexity of AI workloads. The proposed framework addresses critical challenges in energy efficiency, performance, and cost-effectiveness. This joint initiative aims to establish hybrid clouds as secure, efficient, and sustainable platforms.
arXiv Detail & Related papers (2024-11-20T11:57:43Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
OPTIMA: Optimized Policy for Intelligent Multi-Agent Systems Enables Coordination-Aware Autonomous Vehicles [9.41740133451895]
This work introduces OPTIMA, a novel distributed reinforcement learning framework for cooperative autonomous vehicle tasks. Our goal is to improve the generality and performance of CAVs in highly complex and crowded scenarios.
arXiv Detail & Related papers (2024-10-09T03:28:45Z)
Multi-Agent Reinforcement Learning for Long-Term Network Resource Allocation through Auction: a V2X Application [7.326507804995567]
We formulate offloading of computational tasks from a dynamic group of mobile agents (e.g., cars) as decentralized decision making among autonomous agents. We design an interaction mechanism that incentivizes such agents to align private and system goals by balancing between competition and cooperation. We propose a novel multi-agent online learning algorithm that learns with partial, delayed and noisy state information.
arXiv Detail & Related papers (2022-07-29T10:29:06Z)
Task-Oriented Sensing, Computation, and Communication Integration for Multi-Device Edge AI [108.08079323459822]
This paper studies a new multi-intelligent edge artificial-latency (AI) system, which jointly exploits the AI model split inference and integrated sensing and communication (ISAC) We measure the inference accuracy by adopting an approximate but tractable metric, namely discriminant gain.
arXiv Detail & Related papers (2022-07-03T06:57:07Z)
FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems [62.20308752994373]
We propose a new smart network interface card (NIC) for distributed AI training systems using field-programmable gate arrays (FPGAs) Our proposed FPGA-based AI smart NIC enhances overall training performance by 1.6x at 6 nodes, with an estimated 2.5x performance improvement at 32 nodes, compared to the baseline system using conventional NICs.
arXiv Detail & Related papers (2022-04-22T21:57:00Z)
Semi-asynchronous Hierarchical Federated Learning for Cooperative Intelligent Transportation Systems [10.257042901204528]
Cooperative Intelligent Transport System (C-ITS) is a promising network to provide safety, efficiency, sustainability, and comfortable services for automated vehicles and road infrastructures. The components of C-ITS usually generate large amounts of data, which makes it difficult to explore data science. We propose a novel Semi-a synchronous Federated Learning (SHFL) framework for C-ITS that enables elastic edge to cloud model aggregation from data sensing.
arXiv Detail & Related papers (2021-10-18T07:44:34Z)
Reconfigurable Intelligent Surface Assisted Mobile Edge Computing with Heterogeneous Learning Tasks [53.1636151439562]
Mobile edge computing (MEC) provides a natural platform for AI applications. We present an infrastructure to perform machine learning tasks at an MEC with the assistance of a reconfigurable intelligent surface (RIS) Specifically, we minimize the learning error of all participating users by jointly optimizing transmit power of mobile users, beamforming vectors of the base station, and the phase-shift matrix of the RIS.
arXiv Detail & Related papers (2020-12-25T07:08:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.