Towards Resource-Efficient Compound AI Systems
- URL: http://arxiv.org/abs/2501.16634v2
- Date: Wed, 29 Jan 2025 22:19:59 GMT
- Title: Towards Resource-Efficient Compound AI Systems
- Authors: Gohar Irfan Chaudhry, Esha Choukse, Íñigo Goiri, Rodrigo Fonseca, Adam Belay, Ricardo Bianchini,
- Abstract summary: Compound AI Systems integrate multiple interacting components like models, retrievers, and external tools.
Current implementations suffer from inefficient resource utilization due to tight coupling between application logic and execution details.
We propose a declarative workflow programming model and an adaptive runtime system for dynamic scheduling and resource-aware decision-making.
- Score: 4.709762596591902
- License:
- Abstract: Compound AI Systems, integrating multiple interacting components like models, retrievers, and external tools, have emerged as essential for addressing complex AI tasks. However, current implementations suffer from inefficient resource utilization due to tight coupling between application logic and execution details, a disconnect between orchestration and resource management layers, and the perceived exclusiveness between efficiency and quality. We propose a vision for resource-efficient Compound AI Systems through a declarative workflow programming model and an adaptive runtime system for dynamic scheduling and resource-aware decision-making. Decoupling application logic from low-level details exposes levers for the runtime to flexibly configure the execution environment and resources, without compromising on quality. Enabling collaboration between the workflow orchestration and cluster manager enables higher efficiency through better scheduling and resource management. We are building a prototype system, called Murakkab, to realize this vision. Our preliminary evaluation demonstrates speedups up to $\sim 3.4\times$ in workflow completion times while delivering $\sim 4.5\times$ higher energy efficiency, showing promise in optimizing resources and advancing AI system design.
Related papers
- DNN-Powered MLOps Pipeline Optimization for Large Language Models: A Framework for Automated Deployment and Resource Management [0.0]
This research presents a novel framework that leverages Deep Neural Networks (DNNs) to optimize MLOps pipelines for Large Language Models (LLMs)
Our approach introduces an intelligent system that automates deployment decisions, resource allocation, and pipeline optimization while maintaining optimal performance and cost efficiency.
arXiv Detail & Related papers (2025-01-14T14:15:32Z) - Dynamic Scheduling Strategies for Resource Optimization in Computing Environments [0.29008108937701327]
This paper proposes a container scheduling method based on multi-objective optimization, which aims to balance key performance indicators such as resource utilization, load balancing and task completion efficiency.
The experimental results show that compared with traditional static rule algorithms and efficiency algorithms, the optimized scheduling scheme shows significant advantages in resource utilization, load balancing and burst task completion.
arXiv Detail & Related papers (2024-12-23T05:43:17Z) - Cluster-Based Multi-Agent Task Scheduling for Space-Air-Ground Integrated Networks [60.085771314013044]
Low-altitude economy holds significant potential for development in areas such as communication and sensing.
We propose a Clustering-based Multi-agent Deep Deterministic Policy Gradient (CMADDPG) algorithm to address the multi-UAV cooperative task scheduling challenges in SAGIN.
arXiv Detail & Related papers (2024-12-14T06:17:33Z) - Reinforcement Learning for Adaptive Resource Scheduling in Complex System Environments [8.315191578007857]
This study presents a novel computer system performance optimization and adaptive workload management scheduling algorithm based on Q-learning.
By contrast, Q-learning, a reinforcement learning algorithm, continuously learns from system state changes, enabling dynamic scheduling and resource optimization.
This research provides a foundation for the integration of AI-driven adaptive scheduling in future large-scale systems, offering a scalable, intelligent solution to enhance system performance, reduce operating costs, and support sustainable energy consumption.
arXiv Detail & Related papers (2024-11-08T05:58:09Z) - FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - DNN Partitioning, Task Offloading, and Resource Allocation in Dynamic Vehicular Networks: A Lyapunov-Guided Diffusion-Based Reinforcement Learning Approach [49.56404236394601]
We formulate the problem of joint DNN partitioning, task offloading, and resource allocation in Vehicular Edge Computing.
Our objective is to minimize the DNN-based task completion time while guaranteeing the system stability over time.
We propose a Multi-Agent Diffusion-based Deep Reinforcement Learning (MAD2RL) algorithm, incorporating the innovative use of diffusion models.
arXiv Detail & Related papers (2024-06-11T06:31:03Z) - Efficient Orchestrated AI Workflows Execution on Scale-out Spatial Architecture [17.516934379812994]
We present "Orchestrated AIs," an approach that integrates various tasks with logic-driven decisions into dynamic, sophisticated AIs.
We find that the intrinsic Dual Dynamicity of Orchestrated AIs can be effectively represented using the Orchestrated spatial Graph.
Our evaluations demonstrate that significantly outperforms traditional architectures in handling the dynamic demands of Orchestrated AIs.
arXiv Detail & Related papers (2024-05-21T14:09:31Z) - Joint User Association, Interference Cancellation and Power Control for
Multi-IRS Assisted UAV Communications [80.35959154762381]
Intelligent reflecting surface (IRS)-assisted unmanned aerial vehicle (UAV) communications are expected to alleviate the load of ground base stations in a cost-effective way.
Existing studies mainly focus on the deployment and resource allocation of a single IRS instead of multiple IRSs.
We propose a new optimization algorithm for joint IRS-user association, trajectory optimization of UAVs, successive interference cancellation (SIC) decoding order scheduling and power allocation.
arXiv Detail & Related papers (2023-12-08T01:57:10Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Multi-Agent Reinforcement Learning for Long-Term Network Resource
Allocation through Auction: a V2X Application [7.326507804995567]
We formulate offloading of computational tasks from a dynamic group of mobile agents (e.g., cars) as decentralized decision making among autonomous agents.
We design an interaction mechanism that incentivizes such agents to align private and system goals by balancing between competition and cooperation.
We propose a novel multi-agent online learning algorithm that learns with partial, delayed and noisy state information.
arXiv Detail & Related papers (2022-07-29T10:29:06Z) - Energy-Efficient Multi-Orchestrator Mobile Edge Learning [54.28419430315478]
Mobile Edge Learning (MEL) is a collaborative learning paradigm that features distributed training of Machine Learning (ML) models over edge devices.
In MEL, possible coexistence of multiple learning tasks with different datasets may arise.
We propose lightweight algorithms that can achieve near-optimal performance and facilitate the trade-offs between energy consumption, accuracy, and solution complexity.
arXiv Detail & Related papers (2021-09-02T07:37:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.