Edge AI without Compromise: Efficient, Versatile and Accurate
Neurocomputing in Resistive Random-Access Memory
- URL: http://arxiv.org/abs/2108.07879v1
- Date: Tue, 17 Aug 2021 21:08:51 GMT
- Title: Edge AI without Compromise: Efficient, Versatile and Accurate
Neurocomputing in Resistive Random-Access Memory
- Authors: Weier Wan (1), Rajkumar Kubendran (2 and 5), Clemens Schaefer (4), S.
Burc Eryilmaz (1), Wenqiang Zhang (3), Dabin Wu (3), Stephen Deiss (2),
Priyanka Raina (1), He Qian (3), Bin Gao (3), Siddharth Joshi (4 and 2),
Huaqiang Wu (3), H.-S. Philip Wong (1), Gert Cauwenberghs (2) ((1) Stanford
University, (2) University of California San Diego, (3) Tsinghua University,
(4) University of Notre Dame, (5) University of Pittsburgh)
- Abstract summary: We present NeuRRAM - the first multimodal edge AI chip using RRAM CIM.
We show record energy-efficiency $5times$ - $8times$ better than prior art across various computational bit-precisions.
This work paves a way towards building highly efficient and reconfigurable edge AI hardware platforms.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Realizing today's cloud-level artificial intelligence functionalities
directly on devices distributed at the edge of the internet calls for edge
hardware capable of processing multiple modalities of sensory data (e.g. video,
audio) at unprecedented energy-efficiency. AI hardware architectures today
cannot meet the demand due to a fundamental "memory wall": data movement
between separate compute and memory units consumes large energy and incurs long
latency. Resistive random-access memory (RRAM) based compute-in-memory (CIM)
architectures promise to bring orders of magnitude energy-efficiency
improvement by performing computation directly within memory. However,
conventional approaches to CIM hardware design limit its functional flexibility
necessary for processing diverse AI workloads, and must overcome hardware
imperfections that degrade inference accuracy. Such trade-offs between
efficiency, versatility and accuracy cannot be addressed by isolated
improvements on any single level of the design. By co-optimizing across all
hierarchies of the design from algorithms and architecture to circuits and
devices, we present NeuRRAM - the first multimodal edge AI chip using RRAM CIM
to simultaneously deliver a high degree of versatility for diverse model
architectures, record energy-efficiency $5\times$ - $8\times$ better than prior
art across various computational bit-precisions, and inference accuracy
comparable to software models with 4-bit weights on all measured standard AI
benchmarks including accuracy of 99.0% on MNIST and 85.7% on CIFAR-10 image
classification, 84.7% accuracy on Google speech command recognition, and a 70%
reduction in image reconstruction error on a Bayesian image recovery task. This
work paves a way towards building highly efficient and reconfigurable edge AI
hardware platforms for the more demanding and heterogeneous AI applications of
the future.
Related papers
- Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges.
We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs.
This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z) - Random resistive memory-based deep extreme point learning machine for
unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM)
Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z) - Pruning random resistive memory for optimizing analogue AI [54.21621702814583]
AI models present unprecedented challenges to energy consumption and environmental sustainability.
One promising solution is to revisit analogue computing, a technique that predates digital computing.
Here, we report a universal solution, software-hardware co-design using structural plasticity-inspired edge pruning.
arXiv Detail & Related papers (2023-11-13T08:59:01Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - In-memory Implementation of On-chip Trainable and Scalable ANN for AI/ML
Applications [0.0]
This paper presents an in-memory computing architecture for ANN enabling artificial intelligence (AI) and machine learning (ML) applications.
Our novel on-chip training and inference in-memory architecture reduces energy cost and enhances throughput by simultaneously accessing the multiple rows of array per precharge cycle.
The proposed architecture was trained and tested on the IRIS dataset which exhibits $46times$ more energy efficient per MAC (multiply and accumulate) operation compared to earlier classifiers.
arXiv Detail & Related papers (2020-05-19T15:36:39Z) - One-step regression and classification with crosspoint resistive memory
arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge.
One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition.
Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z) - Near-Optimal Hardware Design for Convolutional Neural Networks [0.0]
This study proposes a novel, special-purpose, and high-efficiency hardware architecture for convolutional neural networks.
The proposed architecture maximizes the utilization of multipliers by designing the computational circuit with the same structure as that of the computational flow of the model.
An implementation based on the proposed hardware architecture has been applied in commercial AI products.
arXiv Detail & Related papers (2020-02-06T09:15:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.