CLSA-CIM: A Cross-Layer Scheduling Approach for Computing-in-Memory
Architectures
- URL: http://arxiv.org/abs/2401.07671v2
- Date: Wed, 17 Jan 2024 13:49:05 GMT
- Title: CLSA-CIM: A Cross-Layer Scheduling Approach for Computing-in-Memory
Architectures
- Authors: Rebecca Pelke, Jose Cubero-Cascante, Nils Bosbach, Felix Staudigl,
Rainer Leupers, Jan Moritz Joseph
- Abstract summary: We present CLSA-CIM, a cross-layer scheduling algorithm for tiled CIM architectures.
We integrate CLSA-CIM with existing weight-mapping strategies and compare performance against state-of-the-art (SOTA) scheduling algorithms.
- Score: 0.1747623282473278
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The demand for efficient machine learning (ML) accelerators is growing
rapidly, driving the development of novel computing concepts such as resistive
random access memory (RRAM)-based tiled computing-in-memory (CIM)
architectures. CIM allows to compute within the memory unit, resulting in
faster data processing and reduced power consumption. Efficient compiler
algorithms are essential to exploit the potential of tiled CIM architectures.
While conventional ML compilers focus on code generation for CPUs, GPUs, and
other von Neumann architectures, adaptations are needed to cover CIM
architectures. Cross-layer scheduling is a promising approach, as it enhances
the utilization of CIM cores, thereby accelerating computations. Although
similar concepts are implicitly used in previous work, there is a lack of clear
and quantifiable algorithmic definitions for cross-layer scheduling for tiled
CIM architectures. To close this gap, we present CLSA-CIM, a cross-layer
scheduling algorithm for tiled CIM architectures. We integrate CLSA-CIM with
existing weight-mapping strategies and compare performance against
state-of-the-art (SOTA) scheduling algorithms. CLSA-CIM improves the
utilization by up to 17.9 x , resulting in an overall speedup increase of up to
29.2 x compared to SOTA.
Related papers
- PACiM: A Sparsity-Centric Hybrid Compute-in-Memory Architecture via Probabilistic Approximation [1.2848824355101671]
This paper introduces a novel probabilistic approximate computation (PAC) method that reduces approximation error by 4X compared to existing approaches.
PAC enables efficient sparsity-based computation in compute-in-memory (CiM) systems by simplifying complex MAC vector computations into scalar calculations.
We develop PACiM, a sparsity-centric architecture that fully exploits sparsity to reduce bit-serial cycles by 81% and achieves a peak 8b/8b efficiency of 14.63 TOPS/W in 65 nm CMOS.
arXiv Detail & Related papers (2024-08-29T03:58:19Z) - Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers [58.5711048151424]
We introduce SPARSEK Attention, a novel sparse attention mechanism designed to overcome computational and memory obstacles.
Our approach integrates a scoring network and a differentiable top-k mask operator, SPARSEK, to select a constant number of KV pairs for each query.
Experimental results reveal that SPARSEK Attention outperforms previous sparse attention methods.
arXiv Detail & Related papers (2024-06-24T15:55:59Z) - Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives.
We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis.
We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z) - CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators [10.756046653406296]
We propose CIM-MLC, a universal multi-level compilation framework for general CIM architectures.
CIM-MLC can explore the mapping and scheduling strategies across multiple architectural tiers.
arXiv Detail & Related papers (2024-01-23T01:33:09Z) - Fast, Scalable, Warm-Start Semidefinite Programming with Spectral
Bundling and Sketching [53.91395791840179]
We present Unified Spectral Bundling with Sketching (USBS), a provably correct, fast and scalable algorithm for solving massive SDPs.
USBS provides a 500x speed-up over the state-of-the-art scalable SDP solver on an instance with over 2 billion decision variables.
arXiv Detail & Related papers (2023-12-19T02:27:22Z) - DDC-PIM: Efficient Algorithm/Architecture Co-design for Doubling Data
Capacity of SRAM-based Processing-In-Memory [6.367916611208411]
We propose DDC-PIM, an efficient algorithm/architecture co-design methodology that effectively doubles the equivalent data capacity.
DDC-PIM yields about $2.84times$ speedup on MobileNetV2 and $2.69times$ on EfficientNet-B0 with negligible accuracy loss.
Compared with state-of-the-art macros, DDC-PIM achieves up to $8.41times$ and $2.75times$ improvement in weight density and area efficiency, respectively.
arXiv Detail & Related papers (2023-10-31T12:49:54Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - An Experimental Evaluation of Machine Learning Training on a Real
Processing-in-Memory System [9.429605859159023]
Training machine learning (ML) algorithms is a computationally intensive process, which is frequently memory-bound.
Memory-centric computing systems, with processing-in-memory capabilities, can alleviate this data movement bottleneck.
We implement several representative classic ML algorithms on a real-world general-purpose PIM architecture.
arXiv Detail & Related papers (2022-07-16T09:39:53Z) - Collaborative Intelligent Reflecting Surface Networks with Multi-Agent
Reinforcement Learning [63.83425382922157]
Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks.
In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting.
arXiv Detail & Related papers (2022-03-26T20:37:14Z) - MARS: Multi-macro Architecture SRAM CIM-Based Accelerator with
Co-designed Compressed Neural Networks [0.6817102408452476]
Convolutional neural networks (CNNs) play a key role in deep learning applications.
CIM architecture has demonstrated great potential to effectively compute large-scale matrix-vector multiplication.
To reduce computation costs, network pruning and quantization are two widely studied compression methods to shrink the model size.
arXiv Detail & Related papers (2020-10-24T10:31:49Z) - One-step regression and classification with crosspoint resistive memory
arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge.
One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition.
Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.