Related papers: Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems

Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems

URL: http://arxiv.org/abs/2507.01429v1
Date: Wed, 02 Jul 2025 07:29:53 GMT
Title: Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems
Authors: Benjamin Chen Ming Choong, Tao Luo, Cheng Liu, Bingsheng He, Wei Zhang, Joey Tianyi Zhou,
Abstract summary: racetrack memory is a non-volatile technology that allows high data density fabrication.<n>In-memory arithmetic circuits with memory cells affects both the memory density and power efficiency.<n>We present an efficient in-memory convolutional neural network (CNN) accelerator optimized for use with racetrack memory.
Score: 54.045712360156024
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks generate and process large volumes of data, posing challenges for low-resource embedded systems. In-memory computing has been demonstrated as an efficient computing infrastructure and shows promise for embedded AI applications. Among newly-researched memory technologies, racetrack memory is a non-volatile technology that allows high data density fabrication, making it a good fit for in-memory computing. However, integrating in-memory arithmetic circuits with memory cells affects both the memory density and power efficiency. It remains challenging to build efficient in-memory arithmetic circuits on racetrack memory within area and energy constraints. To this end, we present an efficient in-memory convolutional neural network (CNN) accelerator optimized for use with racetrack memory. We design a series of fundamental arithmetic circuits as in-memory computing cells suited for multiply-and-accumulate operations. Moreover, we explore the design space of racetrack memory based systems and CNN model architectures, employing co-design to improve the efficiency and performance of performing CNN inference in racetrack memory while maintaining model accuracy. Our designed circuits and model-system co-optimization strategies achieve a small memory bank area with significant improvements in energy and performance for racetrack memory based embedded systems.

Related papers

CHIME: Energy-Efficient STT-RAM-based Concurrent Hierarchical In-Memory Processing [1.5566524830295307]
This paper introduces a novel PiC/PiM architecture, Concurrent Hierarchical In-Memory Processing (CHIME) CHIME strategically incorporates heterogeneous compute units across multiple levels of the memory hierarchy. Experiments reveal that, compared to the state-of-the-art bit-line computing approaches, CHIME achieves significant speedup and energy savings of 57.95% and 78.23%.
arXiv Detail & Related papers (2024-07-29T01:17:54Z)
Dynamic neural network with memristive CIM and CAM for 2D and 3D vision [57.6208980140268]
We propose a semantic memory-based dynamic neural network (DNN) using memristor. The network associates incoming data with the past experience stored as semantic vectors. We validate our co-designs, using a 40nm memristor macro, on ResNet and PointNet++ for classifying images and 3D points from the MNIST and ModelNet datasets.
arXiv Detail & Related papers (2024-07-12T04:55:57Z)
Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges. We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs. This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z)
Topology-aware Embedding Memory for Continual Learning on Expanding Networks [63.35819388164267]
We present a framework to tackle the memory explosion problem using memory replay techniques. PDGNNs with Topology-aware Embedding Memory (TEM) significantly outperform state-of-the-art techniques.
arXiv Detail & Related papers (2024-01-24T03:03:17Z)
Pruning random resistive memory for optimizing analogue AI [54.21621702814583]
AI models present unprecedented challenges to energy consumption and environmental sustainability. One promising solution is to revisit analogue computing, a technique that predates digital computing. Here, we report a universal solution, software-hardware co-design using structural plasticity-inspired edge pruning.
arXiv Detail & Related papers (2023-11-13T08:59:01Z)
Continual Learning Approach for Improving the Data and Computation Mapping in Near-Memory Processing System [3.202860612193139]
We propose an artificially intelligent memory mapping scheme, AIMM, that optimize data placement and resource utilization through page and computation remapping. AIMM uses a neural network to achieve a near-optimal mapping during execution, trained using a reinforcement learning algorithm. Our experimental evaluation shows that AIMM improves the baseline NMP performance in single and multiple program scenario by up to 70% and 50%, respectively.
arXiv Detail & Related papers (2021-04-28T09:50:35Z)
Pinpointing the Memory Behaviors of DNN Training [37.78973307051419]
Training of deep neural networks (DNNs) is usually memory-hungry due to the limited device memory capacity of accelerators. In this work, we pinpoint the memory behaviors of each device memory block of GPU during training by instrumenting the memory allocators of the runtime system.
arXiv Detail & Related papers (2021-04-01T05:30:03Z)
Robust High-dimensional Memory-augmented Neural Networks [13.82206983716435]
Memory-augmented neural networks enhance neural networks with an explicit memory to overcome these issues. Access to this explicit memory occurs via soft read and write operations involving every individual memory entry. We propose a robust architecture that employs a computational memory unit as the explicit memory performing analog in-memory computation on high-dimensional (HD) vectors.
arXiv Detail & Related papers (2020-10-05T12:01:56Z)
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces. We train and validate our approach directly on the Intel NNP-I chip for inference. We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z)
In-memory Implementation of On-chip Trainable and Scalable ANN for AI/ML Applications [0.0]
This paper presents an in-memory computing architecture for ANN enabling artificial intelligence (AI) and machine learning (ML) applications. Our novel on-chip training and inference in-memory architecture reduces energy cost and enhances throughput by simultaneously accessing the multiple rows of array per precharge cycle. The proposed architecture was trained and tested on the IRIS dataset which exhibits $46times$ more energy efficient per MAC (multiply and accumulate) operation compared to earlier classifiers.
arXiv Detail & Related papers (2020-05-19T15:36:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.