XPert: Peripheral Circuit & Neural Architecture Co-search for Area and
Energy-efficient Xbar-based Computing
- URL: http://arxiv.org/abs/2303.17646v2
- Date: Tue, 21 Nov 2023 17:07:46 GMT
- Title: XPert: Peripheral Circuit & Neural Architecture Co-search for Area and
Energy-efficient Xbar-based Computing
- Authors: Abhishek Moitra, Abhiroop Bhattacharjee, Youngeun Kim and
Priyadarshini Panda
- Abstract summary: XPert co-searches network architecture and peripheral parameters to achieve optimal performance.
Compared to VGG16 baselines, XPert 10.24x (4.7x) lower EDAP, 1.72x (1.62x) higher TOPS/W,1.93x (3x) higher TOPS/mm2 at 92.46% (56.7%) accuracy for CIFAR10 datasets.
- Score: 13.499706125321605
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The hardware-efficiency and accuracy of Deep Neural Networks (DNNs)
implemented on In-memory Computing (IMC) architectures primarily depend on the
DNN architecture and the peripheral circuit parameters. It is therefore
essential to holistically co-search the network and peripheral parameters to
achieve optimal performance. To this end, we propose XPert, which co-searches
network architecture in tandem with peripheral parameters such as the type and
precision of analog-to-digital converters, crossbar column sharing and the
layer-specific input precision using an optimization-based design space
exploration. Compared to VGG16 baselines, XPert achieves 10.24x (4.7x) lower
EDAP, 1.72x (1.62x) higher TOPS/W,1.93x (3x) higher TOPS/mm2 at 92.46% (56.7%)
accuracy for CIFAR10 (TinyImagenet) datasets. The code for this paper is
available at https://github.com/Intelligent-Computing-Lab-Yale/XPert.
Related papers
- INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order
Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient.
We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture.
We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z) - Flexible Channel Dimensions for Differentiable Architecture Search [50.33956216274694]
We propose a novel differentiable neural architecture search method with an efficient dynamic channel allocation algorithm.
We show that the proposed framework is able to find DNN architectures that are equivalent to previous methods in task accuracy and inference latency.
arXiv Detail & Related papers (2023-06-13T15:21:38Z) - Efficient CNN Architecture Design Guided by Visualization [13.074652653088584]
VGNetG-1.0MP achieves 67.7% top-1 accuracy with 0.99M parameters and 69.2% top-1 accuracy with 1.14M parameters on ImageNet classification dataset.
Our VGNetF-1.5MP archives 64.4%(-3.2%) top-1 accuracy and 66.2%(-1.4%) top-1 accuracy with additional Gaussian kernels.
arXiv Detail & Related papers (2022-07-21T06:22:15Z) - EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for
Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups.
Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K.
Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z) - EAutoDet: Efficient Architecture Search for Object Detection [110.99532343155073]
EAutoDet framework can discover practical backbone and FPN architectures for object detection in 1.4 GPU-days.
We propose a kernel reusing technique by sharing the weights of candidate operations on one edge and consolidating them into one convolution.
In particular, the discovered architectures surpass state-of-the-art object detection NAS methods and achieve 40.1 mAP with 120 FPS and 49.2 mAP with 41.3 FPS on COCO test-dev set.
arXiv Detail & Related papers (2022-03-21T05:56:12Z) - NAX: Co-Designing Neural Network and Hardware Architecture for
Memristive Xbar based Computing Systems [7.481928921197249]
In-Memory Computing (IMC) hardware using Memristive Crossbar Arrays (MCAs) are gaining popularity to accelerate Deep Neural Networks (DNNs)
We propose NAX -- an efficient neural architecture search engine that co-designs neural network and IMC based hardware architecture.
arXiv Detail & Related papers (2021-06-23T02:27:00Z) - Efficient and Generic 1D Dilated Convolution Layer for Deep Learning [52.899995651639436]
We introduce our efficient implementation of a generic 1D convolution layer covering a wide range of parameters.
It is optimized for x86 CPU architectures, in particular, for architectures containing Intel AVX-512 and AVX-512 BFloat16 instructions.
We demonstrate the performance of our optimized 1D convolution layer by utilizing it in the end-to-end neural network training with real genomics datasets.
arXiv Detail & Related papers (2021-04-16T09:54:30Z) - Distilling Optimal Neural Networks: Rapid Search in Diverse Spaces [16.920328058816338]
DONNA (Distilling Optimal Neural Network Architectures) is a novel pipeline for rapid neural architecture search and search space exploration.
In ImageNet classification, architectures found by DONNA are 20% faster than EfficientNet-B0 and MobileNetV2 on a Nvidia V100 GPU at similar accuracy and 10% faster with 0.5% higher accuracy than MobileNetV2-1.4x on a Samsung S20 smartphone.
arXiv Detail & Related papers (2020-12-16T11:00:19Z) - SIMDive: Approximate SIMD Soft Multiplier-Divider for FPGAs with Tunable
Accuracy [3.4154033825543055]
This paper presents for the first time, an SIMD architecture based on novel multiplier and divider with tunable accuracy.
The proposed hybrid architecture implements Mitchell's algorithms and supports precision variability from 8 to 32 bits.
arXiv Detail & Related papers (2020-11-02T17:40:44Z) - DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution
Pruning [135.27931587381596]
We propose an efficient and unified NAS framework termed DDPNAS via dynamic distribution pruning.
In particular, we first sample architectures from a joint categorical distribution. Then the search space is dynamically pruned and its distribution is updated every few epochs.
With the proposed efficient network generation method, we directly obtain the optimal neural architectures on given constraints.
arXiv Detail & Related papers (2019-05-28T06:35:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.