Inter-Layer Scheduling Space Exploration for Multi-model Inference on
Heterogeneous Chiplets
- URL: http://arxiv.org/abs/2312.09401v1
- Date: Thu, 14 Dec 2023 23:45:55 GMT
- Title: Inter-Layer Scheduling Space Exploration for Multi-model Inference on
Heterogeneous Chiplets
- Authors: Mohanad Odema, Hyoukjun Kwon, Mohammad Abdullah Al Faruque
- Abstract summary: We develop an advanced scheduling framework for heterogeneous MCM accelerators.
Experiments using our framework on GPT-2 and ResNet-50 models on a 4-chiplet system have shown upto 2.2x and 1.9x increase in throughput and energy efficiency.
- Score: 15.24495231307868
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To address increasing compute demand from recent multi-model workloads with
heavy models like large language models, we propose to deploy heterogeneous
chiplet-based multi-chip module (MCM)-based accelerators. We develop an
advanced scheduling framework for heterogeneous MCM accelerators that
comprehensively consider complex heterogeneity and inter-chiplet pipelining.
Our experiments using our framework on GPT-2 and ResNet-50 models on a
4-chiplet system have shown upto 2.2x and 1.9x increase in throughput and
energy efficiency, compared to a monolithic accelerator with an optimized
output-stationary dataflow.
Related papers
- EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE.
Our results demonstrate an average 21% improvement in prefill throughput over existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z) - Rapid and Power-Aware Learned Optimization for Modular Receive Beamforming [27.09017677987757]
Multiple-input multiple-output (MIMO) systems play a key role in wireless communication technologies.
We propose a power-oriented optimization algorithm for beamforming in modular hybrid systems.
We show how power efficient beamforming can be encouraged by the learned, via boosting computation with low-resolution phase shifts.
arXiv Detail & Related papers (2024-08-01T10:19:25Z) - SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators [12.416683044819955]
Multi-model workloads with heavy models like recent large language models significantly increased the compute and memory demands on hardware.
To address such increasing demands, designing a scalable hardware architecture became a key problem.
We develop a set of schedulers to navigate the huge scheduling space and codify them into a scheduler, SCAR, with advanced techniques such as inter-chiplet pipelining.
arXiv Detail & Related papers (2024-05-01T18:02:25Z) - Accelerator-driven Data Arrangement to Minimize Transformers Run-time on
Multi-core Architectures [5.46396577345121]
complexity of transformer models in artificial intelligence expands their computational costs, memory usage, and energy consumption.
We propose a novel memory arrangement strategy, governed by the hardware accelerator's kernel size, which effectively minimizes off-chip data access.
Our approach can achieve up to a 2.8x speed increase when executing inferences employing state-of-the-art transformers.
arXiv Detail & Related papers (2023-12-20T13:01:25Z) - AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation [80.33846577924363]
We present All-Pairs Multi-Field Transforms (AMT), a new network architecture for video framegithub.
It is based on two essential designs. First, we build bidirectional volumes for all pairs of pixels, and use the predicted bilateral flows to retrieve correlations.
Second, we derive multiple groups of fine-grained flow fields from one pair of updated coarse flows for performing backward warping on the input frames separately.
arXiv Detail & Related papers (2023-04-19T16:18:47Z) - IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint
Multi-Agent Trajectory Prediction [73.25645602768158]
IPCC-TP is a novel relevance-aware module based on Incremental Pearson Correlation Coefficient to improve multi-agent interaction modeling.
Our module can be conveniently embedded into existing multi-agent prediction methods to extend original motion distribution decoders.
arXiv Detail & Related papers (2023-03-01T15:16:56Z) - Collaborative Intelligent Reflecting Surface Networks with Multi-Agent
Reinforcement Learning [63.83425382922157]
Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks.
In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting.
arXiv Detail & Related papers (2022-03-26T20:37:14Z) - Proximal Policy Optimization-based Transmit Beamforming and Phase-shift
Design in an IRS-aided ISAC System for the THz Band [90.45915557253385]
IRS-aided integrated sensing and communications (ISAC) system operating in the terahertz (THz) band is proposed to maximize the system capacity.
Transmit beamforming and phase-shift design are transformed into a universal optimization problem with ergodic constraints.
arXiv Detail & Related papers (2022-03-21T09:15:18Z) - Data-Driven Deep Learning Based Hybrid Beamforming for Aerial Massive
MIMO-OFDM Systems with Implicit CSI [29.11998008894847]
We propose a data-driven deep learning-based unified hybrid beamforming framework for time division duplex and frequency division duplex systems.
For TDD systems, the proposed DL-based approach jointly models the uplink pilot combining and downlink hybrid beamforming modules as an E2E neural network.
While for FDD systems, we jointly model the downlink pilot transmission, uplink CSI feedback, and downlink hybrid beamforming modules as an E2E neural network.
arXiv Detail & Related papers (2022-01-18T07:21:00Z) - SensiX++: Bringing MLOPs and Multi-tenant Model Serving to Sensory Edge
Devices [69.1412199244903]
We present a multi-tenant runtime for adaptive model execution with integrated MLOps on edge devices, e.g., a camera, a microphone, or IoT sensors.
S SensiX++ operates on two fundamental principles - highly modular componentisation to externalise data operations with clear abstractions and document-centric manifestation for system-wide orchestration.
We report on the overall throughput and quantified benefits of various automation components of SensiX++ and demonstrate its efficacy to significantly reduce operational complexity and lower the effort to deploy, upgrade, reconfigure and serve embedded models on edge devices.
arXiv Detail & Related papers (2021-09-08T22:06:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.