Inter-Layer Scheduling Space Exploration for Multi-model Inference on
Heterogeneous Chiplets
- URL: http://arxiv.org/abs/2312.09401v1
- Date: Thu, 14 Dec 2023 23:45:55 GMT
- Title: Inter-Layer Scheduling Space Exploration for Multi-model Inference on
Heterogeneous Chiplets
- Authors: Mohanad Odema, Hyoukjun Kwon, Mohammad Abdullah Al Faruque
- Abstract summary: We develop an advanced scheduling framework for heterogeneous MCM accelerators.
Experiments using our framework on GPT-2 and ResNet-50 models on a 4-chiplet system have shown upto 2.2x and 1.9x increase in throughput and energy efficiency.
- Score: 15.24495231307868
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To address increasing compute demand from recent multi-model workloads with
heavy models like large language models, we propose to deploy heterogeneous
chiplet-based multi-chip module (MCM)-based accelerators. We develop an
advanced scheduling framework for heterogeneous MCM accelerators that
comprehensively consider complex heterogeneity and inter-chiplet pipelining.
Our experiments using our framework on GPT-2 and ResNet-50 models on a
4-chiplet system have shown upto 2.2x and 1.9x increase in throughput and
energy efficiency, compared to a monolithic accelerator with an optimized
output-stationary dataflow.
Related papers
- Joint Transmit and Pinching Beamforming for PASS: Optimization-Based or Learning-Based? [89.05848771674773]
A novel antenna system ()-enabled downlink multi-user multiple-input single-output (MISO) framework is proposed.
It consists of multiple waveguides, which equip numerous low-cost antennas, named (PAs)
The positions of PAs can be reconfigured to both spanning large-scale path and space.
arXiv Detail & Related papers (2025-02-12T18:54:10Z) - EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE that surpasses the existing parallelism schemes.
Our results demonstrate at most 52.4% improvement in prefill throughput compared to existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z) - Rapid and Power-Aware Learned Optimization for Modular Receive Beamforming [27.09017677987757]
Multiple-input multiple-output (MIMO) systems play a key role in wireless communication technologies.
We propose a power-oriented optimization algorithm for beamforming in modular hybrid systems.
We show how power efficient beamforming can be encouraged by the learned, via boosting computation with low-resolution phase shifts.
arXiv Detail & Related papers (2024-08-01T10:19:25Z) - SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators [12.416683044819955]
Multi-model workloads with heavy models like recent large language models significantly increased the compute and memory demands on hardware.
To address such increasing demands, designing a scalable hardware architecture became a key problem.
We develop a set of schedulers to navigate the huge scheduling space and codify them into a scheduler, SCAR, with advanced techniques such as inter-chiplet pipelining.
arXiv Detail & Related papers (2024-05-01T18:02:25Z) - AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation [80.33846577924363]
We present All-Pairs Multi-Field Transforms (AMT), a new network architecture for video framegithub.
It is based on two essential designs. First, we build bidirectional volumes for all pairs of pixels, and use the predicted bilateral flows to retrieve correlations.
Second, we derive multiple groups of fine-grained flow fields from one pair of updated coarse flows for performing backward warping on the input frames separately.
arXiv Detail & Related papers (2023-04-19T16:18:47Z) - IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint
Multi-Agent Trajectory Prediction [73.25645602768158]
IPCC-TP is a novel relevance-aware module based on Incremental Pearson Correlation Coefficient to improve multi-agent interaction modeling.
Our module can be conveniently embedded into existing multi-agent prediction methods to extend original motion distribution decoders.
arXiv Detail & Related papers (2023-03-01T15:16:56Z) - Collaborative Intelligent Reflecting Surface Networks with Multi-Agent
Reinforcement Learning [63.83425382922157]
Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks.
In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting.
arXiv Detail & Related papers (2022-03-26T20:37:14Z) - SensiX++: Bringing MLOPs and Multi-tenant Model Serving to Sensory Edge
Devices [69.1412199244903]
We present a multi-tenant runtime for adaptive model execution with integrated MLOps on edge devices, e.g., a camera, a microphone, or IoT sensors.
S SensiX++ operates on two fundamental principles - highly modular componentisation to externalise data operations with clear abstractions and document-centric manifestation for system-wide orchestration.
We report on the overall throughput and quantified benefits of various automation components of SensiX++ and demonstrate its efficacy to significantly reduce operational complexity and lower the effort to deploy, upgrade, reconfigure and serve embedded models on edge devices.
arXiv Detail & Related papers (2021-09-08T22:06:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.