Heterogeneous Integration of In-Memory Analog Computing Architectures
with Tensor Processing Units
- URL: http://arxiv.org/abs/2304.09258v1
- Date: Tue, 18 Apr 2023 19:44:56 GMT
- Title: Heterogeneous Integration of In-Memory Analog Computing Architectures
with Tensor Processing Units
- Authors: Mohammed E. Elbtity, Brendan Reidy, Md Hasibul Amin, and Ramtin Zand
- Abstract summary: This paper introduces a novel, heterogeneous, mixed-signal, and mixed-precision architecture that integrates an IMAC unit with an edge TPU to enhance mobile CNN performance.
We propose a unified learning algorithm that incorporates mixed-precision training techniques to mitigate potential accuracy drops when deploying models on the TPU-IMAC architecture.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tensor processing units (TPUs), specialized hardware accelerators for machine
learning tasks, have shown significant performance improvements when executing
convolutional layers in convolutional neural networks (CNNs). However, they
struggle to maintain the same efficiency in fully connected (FC) layers,
leading to suboptimal hardware utilization. In-memory analog computing (IMAC)
architectures, on the other hand, have demonstrated notable speedup in
executing FC layers. This paper introduces a novel, heterogeneous,
mixed-signal, and mixed-precision architecture that integrates an IMAC unit
with an edge TPU to enhance mobile CNN performance. To leverage the strengths
of TPUs for convolutional layers and IMAC circuits for dense layers, we propose
a unified learning algorithm that incorporates mixed-precision training
techniques to mitigate potential accuracy drops when deploying models on the
TPU-IMAC architecture. The simulations demonstrate that the TPU-IMAC
configuration achieves up to $2.59\times$ performance improvements, and $88\%$
memory reductions compared to conventional TPU architectures for various CNN
models while maintaining comparable accuracy. The TPU-IMAC architecture shows
potential for various applications where energy efficiency and high performance
are essential, such as edge computing and real-time processing in mobile
devices. The unified training algorithm and the integration of IMAC and TPU
architectures contribute to the potential impact of this research on the
broader machine learning landscape.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE.
Our results demonstrate an average 21% improvement in prefill throughput over existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z) - A Realistic Simulation Framework for Analog/Digital Neuromorphic Architectures [73.65190161312555]
ARCANA is a spiking neural network simulator designed to account for the properties of mixed-signal neuromorphic circuits.
We show how the results obtained provide a reliable estimate of the behavior of the spiking neural network trained in software.
arXiv Detail & Related papers (2024-09-23T11:16:46Z) - Flex-TPU: A Flexible TPU with Runtime Reconfigurable Dataflow Architecture [0.0]
The work herein consists of developing a reconfigurable dataflow TPU, called the Flex-TPU, which can dynamically change the dataflow per layer during run-time.
The results show that our Flex-TPU design achieves a significant performance increase of up to 2.75x compared to conventional TPU, with only minor area and power overheads.
arXiv Detail & Related papers (2024-07-11T17:33:38Z) - Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives.
We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis.
We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z) - Harnessing Manycore Processors with Distributed Memory for Accelerated
Training of Sparse and Recurrent Models [43.1773057439246]
Current AI training infrastructure is dominated by single instruction multiple data (SIMD) and systolic array architectures.
We explore sparse and recurrent model training on a massively parallel multiple instruction multiple data architecture with distributed local memory.
arXiv Detail & Related papers (2023-11-07T23:18:35Z) - Exploration of TPUs for AI Applications [0.0]
Processing Units (TPUs) are specialized hardware accelerators for deep learning developed by Google.
This paper aims to explore TPUs in cloud and edge computing focusing on its applications in AI.
arXiv Detail & Related papers (2023-09-16T07:58:05Z) - ConvBLS: An Effective and Efficient Incremental Convolutional Broad
Learning System for Image Classification [63.49762079000726]
We propose a convolutional broad learning system (ConvBLS) based on the spherical K-means (SKM) algorithm and two-stage multi-scale (TSMS) feature fusion.
Our proposed ConvBLS method is unprecedentedly efficient and effective.
arXiv Detail & Related papers (2023-04-01T04:16:12Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - An In-Memory Analog Computing Co-Processor for Energy-Efficient CNN
Inference on Mobile Devices [4.117012092777604]
We develop an in-memory analog computing (IMAC) architecture realizing both synaptic behavior and activation functions within non-volatile memory arrays.
Spin-orbit torque magnetoresistive random-access memory (SOT-MRAM) devices are leveraged to realize sigmoidal neurons as well as binarized synapses.
A heterogeneous mixed-signal and mixed-precision CPU-IMAC architecture is proposed for convolutional neural networks (CNNs) inference on mobile processors.
arXiv Detail & Related papers (2021-05-24T23:01:36Z) - Hybrid In-memory Computing Architecture for the Training of Deep Neural
Networks [5.050213408539571]
We propose a hybrid in-memory computing architecture for the training of deep neural networks (DNNs) on hardware accelerators.
We show that HIC-based training results in about 50% less inference model size to achieve baseline comparable accuracy.
Our simulations indicate HIC-based training naturally ensures that the number of write-erase cycles seen by the devices is a small fraction of the endurance limit of PCM.
arXiv Detail & Related papers (2021-02-10T05:26:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.