Related papers: NAX: Co-Designing Neural Network and Hardware Architecture for Memristive Xbar based Computing Systems

NAX: Co-Designing Neural Network and Hardware Architecture for Memristive Xbar based Computing Systems

URL: http://arxiv.org/abs/2106.12125v1
Date: Wed, 23 Jun 2021 02:27:00 GMT
Title: NAX: Co-Designing Neural Network and Hardware Architecture for Memristive Xbar based Computing Systems
Authors: Shubham Negi, Indranil Chakraborty, Aayush Ankit, Kaushik Roy
Abstract summary: In-Memory Computing (IMC) hardware using Memristive Crossbar Arrays (MCAs) are gaining popularity to accelerate Deep Neural Networks (DNNs) We propose NAX -- an efficient neural architecture search engine that co-designs neural network and IMC based hardware architecture.
Score: 7.481928921197249
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In-Memory Computing (IMC) hardware using Memristive Crossbar Arrays (MCAs) are gaining popularity to accelerate Deep Neural Networks (DNNs) since it alleviates the "memory wall" problem associated with von-Neumann architecture. The hardware efficiency (energy, latency and area) as well as application accuracy (considering device and circuit non-idealities) of DNNs mapped to such hardware are co-dependent on network parameters, such as kernel size, depth etc. and hardware architecture parameters such as crossbar size. However, co-optimization of both network and hardware parameters presents a challenging search space comprising of different kernel sizes mapped to varying crossbar sizes. To that effect, we propose NAX -- an efficient neural architecture search engine that co-designs neural network and IMC based hardware architecture. NAX explores the aforementioned search space to determine kernel and corresponding crossbar sizes for each DNN layer to achieve optimal tradeoffs between hardware efficiency and application accuracy. Our results from NAX show that the networks have heterogeneous crossbar sizes across different network layers, and achieves optimal hardware efficiency and accuracy considering the non-idealities in crossbars. On CIFAR-10 and Tiny ImageNet, our models achieve 0.8%, 0.2% higher accuracy, and 17%, 4% lower EDAP (energy-delay-area product) compared to a baseline ResNet-20 and ResNet-18 models, respectively.

Related papers

RNC: Efficient RRAM-aware NAS and Compilation for DNNs on Resource-Constrained Edge Devices [0.30458577208819987]
We aim to develop edge-friendly deep neural networks (DNNs) for accelerators based on resistive random-access memory (RRAM) We propose an edge compilation and resource-constrained RRAM-aware neural architecture search (NAS) framework to search for optimized neural networks meeting specific hardware constraints. The resulting model from NAS optimized for speed achieved 5x-30x speedup.
arXiv Detail & Related papers (2024-09-27T15:35:36Z)
CiMNet: Towards Joint Optimization for DNN Architecture and Configuration for Compute-In-Memory Hardware [6.308771129448823]
We present CiMNet, a framework that jointly searches for optimal sub-networks and hardware configurations for compute in-memory (CiM) The proposed framework can comprehend the complex interplay between a sub-network's performance and the CiM hardware configuration choices.
arXiv Detail & Related papers (2024-02-19T02:12:07Z)
Flexible Channel Dimensions for Differentiable Architecture Search [50.33956216274694]
We propose a novel differentiable neural architecture search method with an efficient dynamic channel allocation algorithm. We show that the proposed framework is able to find DNN architectures that are equivalent to previous methods in task accuracy and inference latency.
arXiv Detail & Related papers (2023-06-13T15:21:38Z)
XPert: Peripheral Circuit & Neural Architecture Co-search for Area and Energy-efficient Xbar-based Computing [13.499706125321605]
XPert co-searches network architecture and peripheral parameters to achieve optimal performance. Compared to VGG16 baselines, XPert 10.24x (4.7x) lower EDAP, 1.72x (1.62x) higher TOPS/W,1.93x (3x) higher TOPS/mm2 at 92.46% (56.7%) accuracy for CIFAR10 datasets.
arXiv Detail & Related papers (2023-03-30T18:23:20Z)
XploreNAS: Explore Adversarially Robust & Hardware-efficient Neural Architectures for Non-ideal Xbars [2.222917681321253]
This work proposes a two-phase algorithm-hardware co-optimization approach called XploreNAS. It searches for hardware-efficient & adversarially robust neural architectures for non-ideal crossbar platforms. Experiments on crossbars with benchmark datasets show upto 8-16% improvement in the adversarial robustness of the searched Subnets.
arXiv Detail & Related papers (2023-02-15T16:44:18Z)
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups. Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K. Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z)
Greedy Network Enlarging [53.319011626986004]
We propose a greedy network enlarging method based on the reallocation of computations. With step-by-step modifying the computations on different stages, the enlarged network will be equipped with optimal allocation and utilization of MACs. With application of our method on GhostNet, we achieve state-of-the-art 80.9% and 84.3% ImageNet top-1 accuracies.
arXiv Detail & Related papers (2021-07-31T08:36:30Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
Hardware-Centric AutoML for Mixed-Precision Quantization [34.39845532939529]
Conventional quantization algorithm ignores the different hardware architectures and quantizes all the layers in a uniform way. In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy. Our framework effectively reduced the latency by 1.4-1.95x and the energy consumption by 1.9x with negligible loss of accuracy compared with the fixed bitwidth (8 bits) quantization.
arXiv Detail & Related papers (2020-08-11T17:30:22Z)
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution Pruning [135.27931587381596]
We propose an efficient and unified NAS framework termed DDPNAS via dynamic distribution pruning. In particular, we first sample architectures from a joint categorical distribution. Then the search space is dynamically pruned and its distribution is updated every few epochs. With the proposed efficient network generation method, we directly obtain the optimal neural architectures on given constraints.
arXiv Detail & Related papers (2019-05-28T06:35:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.