Related papers: CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators

CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators

URL: http://arxiv.org/abs/2401.12428v2
Date: Wed, 8 May 2024 06:44:41 GMT
Title: CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators
Authors: Songyun Qu, Shixin Zhao, Bing Li, Yintao He, Xuyi Cai, Lei Zhang, Ying Wang,
Abstract summary: We propose CIM-MLC, a universal multi-level compilation framework for general CIM architectures. CIM-MLC can explore the mapping and scheduling strategies across multiple architectural tiers.
Score: 10.756046653406296
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, various computing-in-memory (CIM) processors have been presented, showing superior performance over traditional architectures. To unleash the potential of various CIM architectures, such as device precision, crossbar size, and crossbar number, it is necessary to develop compilation tools that are fully aware of the CIM architectural details and implementation diversity. However, due to the lack of architectural support in current popular open-source compiling stacks, existing CIM designs either manually deploy networks or build their own compilers, which is time-consuming and labor-intensive. Although some works expose the specific CIM device programming interfaces to compilers, they are often bound to a fixed CIM architecture, lacking the flexibility to support the CIM architectures with different computing granularity. On the other hand, existing compilation works usually consider the scheduling of limited operation types (such as crossbar-bound matrix-vector multiplication). Unlike conventional processors, CIM accelerators are featured by their diverse architecture, circuit, and device, which cannot be simply abstracted by a single level if we seek to fully explore the advantages brought by CIM. Therefore, we propose CIM-MLC, a universal multi-level compilation framework for general CIM architectures. We first establish a general hardware abstraction for CIM architectures and computing modes to represent various CIM accelerators. Based on the proposed abstraction, CIM-MLC can compile tasks onto a wide range of CIM accelerators having different devices, architectures, and programming interfaces. More importantly, compared with existing compilation work, CIM-MLC can explore the mapping and scheduling strategies across multiple architectural tiers, which form a tractable yet effective design space, to achieve better scheduling and instruction generation results.

Related papers

A Modular Reference Architecture for MCP-Servers Enabling Agentic BIM Interaction [0.5219568203653523]
Agentic driven by large language models (LLMs) are increasingly applied to Building Information Modelling.<n>Recent work has begun adopting the emerging Model Context Protocol (MCP) as a uniform tool-calling interface for LLMs.<n>Current BIM-side implementations are still authoring tool-specific and ad hoc, limiting reuse, evaluation, and workflow portability across environments.<n>This paper introduces a modular reference architecture for MCP servers that enables API-agnostic, isolated and reproducible agentic BIM interactions.
arXiv Detail & Related papers (2025-12-21T23:12:26Z)
xLLM Technical Report [57.13120905321185]
We introduce xLLM, an intelligent and efficient Large Language Model (LLM) inference framework.<n>xLLM builds a novel decoupled service-engine architecture.<n>xLLM-Engine co-optimizes system and algorithm designs to fully saturate computing resources.
arXiv Detail & Related papers (2025-10-16T13:53:47Z)
A High-Level Compiler Integration Approach for Deep Learning Accelerators Supporting Abstraction and Optimization [1.2828127925625228]
We introduce a TVM-based compilation integration approach that targets GEMM-based deep learning accelerators.<n>Our approach abstracts the complexities of compiler integration, enabling seamless integration of accelerators.<n>Our framework is benchmarked on the Gemmini accelerator, demonstrating performance comparable to its specialized manually implemented toolchain.
arXiv Detail & Related papers (2025-07-07T09:50:15Z)
CIMFlow: An Integrated Framework for Systematic Design and Evaluation of Digital CIM Architectures [5.7317927540954505]
CIMFlow is an integrated framework that provides an out-of-the-box workflow for implementing and evaluating workloads on digital CIM architectures.<n> CIMFlow bridges the compilation and simulation infrastructures with a flexible instruction set architecture.
arXiv Detail & Related papers (2025-05-02T08:38:30Z)
Understanding and Optimizing Multi-Stage AI Inference Pipelines [11.254219071373319]
HERMES is a Heterogeneous Multi-stage LLM inference Execution Simulator. HERMES supports heterogeneous clients executing multiple models concurrently unlike prior frameworks. We explore the impact of reasoning stages on end-to-end latency, optimal strategies for hybrid pipelines, and the architectural implications of remote KV cache retrieval.
arXiv Detail & Related papers (2025-04-14T00:29:49Z)
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities. Specifically, it features modality-specific encoders with connectors for a unified multimodal representation. We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z)
EasyACIM: An End-to-End Automated Analog CIM with Synthesizable Architecture and Agile Design Space Exploration [4.31899314328104]
This work proposes an end-to-end automated ACIM based on a synthesizable architecture (EasyACIM) EasyACIM can generate layouts for ACIMs with various design specifications end-to-end automatically. The ACIM solutions given by EasyACIM have a wide design space and competitive performance compared to the state-of-the-art (SOTA) ACIMs.
arXiv Detail & Related papers (2024-04-12T08:12:17Z)
Dataflow-Aware PIM-Enabled Manycore Architecture for Deep Learning Workloads [16.67441258454545]
Processing-in-memory (PIM) has emerged as an enabler for the energy-efficient and high-performance acceleration of deep learning (DL) workloads. Resistive random-access memory (ReRAM) is one of the most promising technologies to implement PIM. Existing PIM-based architectures mostly focus on computation while ignoring the role of communication.
arXiv Detail & Related papers (2024-03-28T00:29:15Z)
Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives. We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis. We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z)
Using the Abstract Computer Architecture Description Language to Model AI Hardware Accelerators [77.89070422157178]
Manufacturers of AI-integrated products face a critical challenge: selecting an accelerator that aligns with their product's performance requirements. The Abstract Computer Architecture Description Language (ACADL) is a concise formalization of computer architecture block diagrams. In this paper, we demonstrate how to use the ACADL to model AI hardware accelerators, use their ACADL description to map DNNs onto them, and explain the timing simulation semantics to gather performance results.
arXiv Detail & Related papers (2024-01-30T19:27:16Z)
CLSA-CIM: A Cross-Layer Scheduling Approach for Computing-in-Memory Architectures [0.1747623282473278]
We present CLSA-CIM, a cross-layer scheduling algorithm for tiled CIM architectures. We integrate CLSA-CIM with existing weight-mapping strategies and compare performance against state-of-the-art (SOTA) scheduling algorithms.
arXiv Detail & Related papers (2024-01-15T13:35:21Z)
Machine Learning-Enabled Software and System Architecture Frameworks [48.87872564630711]
The stakeholders with data science and Machine Learning related concerns, such as data scientists and data engineers, are yet to be included in existing architecture frameworks. We surveyed 61 subject matter experts from over 25 organizations in 10 countries.
arXiv Detail & Related papers (2023-08-09T21:54:34Z)
A Many-ported and Shared Memory Architecture for High-Performance ADAS SoCs [11.760927352147798]
We present a shared memory architecture that enables high data throughput among native parallel accesses to ADAS applications. The results validate that the proposed architecture provides close to 100% throughput for both read and write accesses. It can also provide consistent to the domain specific payloads while enabling the scalability and modularity of the design.
arXiv Detail & Related papers (2022-09-13T04:58:27Z)
Enabling Retargetable Optimizing Compilers for Quantum Accelerators via a Multi-Level Intermediate Representation [78.8942067357231]
We present a multi-level quantum-classical intermediate representation (IR) that enables an optimizing, retargetable, ahead-of-time compiler. We support the entire gate-based OpenQASM 3 language and provide custom extensions for common quantum programming patterns and improved syntax. Our work results in compile times that are 1000x faster than standard Pythonic approaches, and 5-10x faster than comparative standalone quantum language compilers.
arXiv Detail & Related papers (2021-09-01T17:29:47Z)
Extending C++ for Heterogeneous Quantum-Classical Computing [56.782064931823015]
qcor is a language extension to C++ and compiler implementation that enables heterogeneous quantum-classical programming, compilation, and execution in a single-source context. Our work provides a first-of-its-kind C++ compiler enabling high-level quantum kernel (function) expression in a quantum-language manner.
arXiv Detail & Related papers (2020-10-08T12:49:07Z)
MLIR: A Compiler Infrastructure for the End of Moore's Law [14.795080852112083]
MLIR aims to address software fragmentation, improve compilation for heterogeneous hardware, and significantly reduce the cost of building domain specific compilers. MLIR facilitates the design and implementation of code generators, translators and translators at different levels of abstraction.
arXiv Detail & Related papers (2020-02-25T17:24:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.