CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators
- URL: http://arxiv.org/abs/2401.12428v2
- Date: Wed, 8 May 2024 06:44:41 GMT
- Title: CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators
- Authors: Songyun Qu, Shixin Zhao, Bing Li, Yintao He, Xuyi Cai, Lei Zhang, Ying Wang,
- Abstract summary: We propose CIM-MLC, a universal multi-level compilation framework for general CIM architectures.
CIM-MLC can explore the mapping and scheduling strategies across multiple architectural tiers.
- Score: 10.756046653406296
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, various computing-in-memory (CIM) processors have been presented, showing superior performance over traditional architectures. To unleash the potential of various CIM architectures, such as device precision, crossbar size, and crossbar number, it is necessary to develop compilation tools that are fully aware of the CIM architectural details and implementation diversity. However, due to the lack of architectural support in current popular open-source compiling stacks, existing CIM designs either manually deploy networks or build their own compilers, which is time-consuming and labor-intensive. Although some works expose the specific CIM device programming interfaces to compilers, they are often bound to a fixed CIM architecture, lacking the flexibility to support the CIM architectures with different computing granularity. On the other hand, existing compilation works usually consider the scheduling of limited operation types (such as crossbar-bound matrix-vector multiplication). Unlike conventional processors, CIM accelerators are featured by their diverse architecture, circuit, and device, which cannot be simply abstracted by a single level if we seek to fully explore the advantages brought by CIM. Therefore, we propose CIM-MLC, a universal multi-level compilation framework for general CIM architectures. We first establish a general hardware abstraction for CIM architectures and computing modes to represent various CIM accelerators. Based on the proposed abstraction, CIM-MLC can compile tasks onto a wide range of CIM accelerators having different devices, architectures, and programming interfaces. More importantly, compared with existing compilation work, CIM-MLC can explore the mapping and scheduling strategies across multiple architectural tiers, which form a tractable yet effective design space, to achieve better scheduling and instruction generation results.
Related papers
- Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities.
Specifically, it features modality-specific encoders with connectors for a unified multimodal representation.
We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z) - EasyACIM: An End-to-End Automated Analog CIM with Synthesizable Architecture and Agile Design Space Exploration [4.31899314328104]
This work proposes an end-to-end automated ACIM based on a synthesizable architecture (EasyACIM)
EasyACIM can generate layouts for ACIMs with various design specifications end-to-end automatically.
The ACIM solutions given by EasyACIM have a wide design space and competitive performance compared to the state-of-the-art (SOTA) ACIMs.
arXiv Detail & Related papers (2024-04-12T08:12:17Z) - Dataflow-Aware PIM-Enabled Manycore Architecture for Deep Learning Workloads [16.67441258454545]
Processing-in-memory (PIM) has emerged as an enabler for the energy-efficient and high-performance acceleration of deep learning (DL) workloads.
Resistive random-access memory (ReRAM) is one of the most promising technologies to implement PIM.
Existing PIM-based architectures mostly focus on computation while ignoring the role of communication.
arXiv Detail & Related papers (2024-03-28T00:29:15Z) - Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives.
We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis.
We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z) - Using the Abstract Computer Architecture Description Language to Model
AI Hardware Accelerators [77.89070422157178]
Manufacturers of AI-integrated products face a critical challenge: selecting an accelerator that aligns with their product's performance requirements.
The Abstract Computer Architecture Description Language (ACADL) is a concise formalization of computer architecture block diagrams.
In this paper, we demonstrate how to use the ACADL to model AI hardware accelerators, use their ACADL description to map DNNs onto them, and explain the timing simulation semantics to gather performance results.
arXiv Detail & Related papers (2024-01-30T19:27:16Z) - CLSA-CIM: A Cross-Layer Scheduling Approach for Computing-in-Memory
Architectures [0.1747623282473278]
We present CLSA-CIM, a cross-layer scheduling algorithm for tiled CIM architectures.
We integrate CLSA-CIM with existing weight-mapping strategies and compare performance against state-of-the-art (SOTA) scheduling algorithms.
arXiv Detail & Related papers (2024-01-15T13:35:21Z) - Machine Learning-Enabled Software and System Architecture Frameworks [48.87872564630711]
The stakeholders with data science and Machine Learning related concerns, such as data scientists and data engineers, are yet to be included in existing architecture frameworks.
We surveyed 61 subject matter experts from over 25 organizations in 10 countries.
arXiv Detail & Related papers (2023-08-09T21:54:34Z) - A Many-ported and Shared Memory Architecture for High-Performance ADAS
SoCs [11.760927352147798]
We present a shared memory architecture that enables high data throughput among native parallel accesses to ADAS applications.
The results validate that the proposed architecture provides close to 100% throughput for both read and write accesses.
It can also provide consistent to the domain specific payloads while enabling the scalability and modularity of the design.
arXiv Detail & Related papers (2022-09-13T04:58:27Z) - Enabling Retargetable Optimizing Compilers for Quantum Accelerators via
a Multi-Level Intermediate Representation [78.8942067357231]
We present a multi-level quantum-classical intermediate representation (IR) that enables an optimizing, retargetable, ahead-of-time compiler.
We support the entire gate-based OpenQASM 3 language and provide custom extensions for common quantum programming patterns and improved syntax.
Our work results in compile times that are 1000x faster than standard Pythonic approaches, and 5-10x faster than comparative standalone quantum language compilers.
arXiv Detail & Related papers (2021-09-01T17:29:47Z) - Extending C++ for Heterogeneous Quantum-Classical Computing [56.782064931823015]
qcor is a language extension to C++ and compiler implementation that enables heterogeneous quantum-classical programming, compilation, and execution in a single-source context.
Our work provides a first-of-its-kind C++ compiler enabling high-level quantum kernel (function) expression in a quantum-language manner.
arXiv Detail & Related papers (2020-10-08T12:49:07Z) - MLIR: A Compiler Infrastructure for the End of Moore's Law [14.795080852112083]
MLIR aims to address software fragmentation, improve compilation for heterogeneous hardware, and significantly reduce the cost of building domain specific compilers.
MLIR facilitates the design and implementation of code generators, translators and translators at different levels of abstraction.
arXiv Detail & Related papers (2020-02-25T17:24:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.