Related papers: Heterogeneous Integration of In-Memory Analog Computing Architectures with Tensor Processing Units

Heterogeneous Integration of In-Memory Analog Computing Architectures with Tensor Processing Units

URL: http://arxiv.org/abs/2304.09258v1
Date: Tue, 18 Apr 2023 19:44:56 GMT
Title: Heterogeneous Integration of In-Memory Analog Computing Architectures with Tensor Processing Units
Authors: Mohammed E. Elbtity, Brendan Reidy, Md Hasibul Amin, and Ramtin Zand
Abstract summary: This paper introduces a novel, heterogeneous, mixed-signal, and mixed-precision architecture that integrates an IMAC unit with an edge TPU to enhance mobile CNN performance. We propose a unified learning algorithm that incorporates mixed-precision training techniques to mitigate potential accuracy drops when deploying models on the TPU-IMAC architecture.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tensor processing units (TPUs), specialized hardware accelerators for machine learning tasks, have shown significant performance improvements when executing convolutional layers in convolutional neural networks (CNNs). However, they struggle to maintain the same efficiency in fully connected (FC) layers, leading to suboptimal hardware utilization. In-memory analog computing (IMAC) architectures, on the other hand, have demonstrated notable speedup in executing FC layers. This paper introduces a novel, heterogeneous, mixed-signal, and mixed-precision architecture that integrates an IMAC unit with an edge TPU to enhance mobile CNN performance. To leverage the strengths of TPUs for convolutional layers and IMAC circuits for dense layers, we propose a unified learning algorithm that incorporates mixed-precision training techniques to mitigate potential accuracy drops when deploying models on the TPU-IMAC architecture. The simulations demonstrate that the TPU-IMAC configuration achieves up to $2.59\times$ performance improvements, and $88\%$ memory reductions compared to conventional TPU architectures for various CNN models while maintaining comparable accuracy. The TPU-IMAC architecture shows potential for various applications where energy efficiency and high performance are essential, such as edge computing and real-time processing in mobile devices. The unified training algorithm and the integration of IMAC and TPU architectures contribute to the potential impact of this research on the broader machine learning landscape.

Related papers

DOMAC: Differentiable Optimization for High-Speed Multipliers and Multiply-Accumulators [25.876084896293058]
DOMAC is a novel approach that employs differentiable optimization for designing multipliers and MACs at specific technology nodes. Building on this insight, DOMAC reformulates the discrete optimization challenge into a continuous problem by incorporating differentiable timing and area objectives.
arXiv Detail & Related papers (2025-03-31T10:49:05Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE. Our results demonstrate an average 21% improvement in prefill throughput over existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z)
A Realistic Simulation Framework for Analog/Digital Neuromorphic Architectures [73.65190161312555]
ARCANA is a spiking neural network simulator designed to account for the properties of mixed-signal neuromorphic circuits. We show how the results obtained provide a reliable estimate of the behavior of the spiking neural network trained in software.
arXiv Detail & Related papers (2024-09-23T11:16:46Z)
Flex-TPU: A Flexible TPU with Runtime Reconfigurable Dataflow Architecture [0.0]
The work herein consists of developing a reconfigurable dataflow TPU, called the Flex-TPU, which can dynamically change the dataflow per layer during run-time. The results show that our Flex-TPU design achieves a significant performance increase of up to 2.75x compared to conventional TPU, with only minor area and power overheads.
arXiv Detail & Related papers (2024-07-11T17:33:38Z)
Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives. We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis. We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z)
Harnessing Manycore Processors with Distributed Memory for Accelerated Training of Sparse and Recurrent Models [43.1773057439246]
Current AI training infrastructure is dominated by single instruction multiple data (SIMD) and systolic array architectures. We explore sparse and recurrent model training on a massively parallel multiple instruction multiple data architecture with distributed local memory.
arXiv Detail & Related papers (2023-11-07T23:18:35Z)
Exploration of TPUs for AI Applications [0.0]
Processing Units (TPUs) are specialized hardware accelerators for deep learning developed by Google. This paper aims to explore TPUs in cloud and edge computing focusing on its applications in AI.
arXiv Detail & Related papers (2023-09-16T07:58:05Z)
Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
Layer-wise Feedback feedback (LFP) is a novel training principle for neural network-like predictors.<n>LFP decomposes a reward to individual neurons based on their respective contributions.<n>Our method then implements a greedy reinforcing approach helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z)
ConvBLS: An Effective and Efficient Incremental Convolutional Broad Learning System for Image Classification [63.49762079000726]
We propose a convolutional broad learning system (ConvBLS) based on the spherical K-means (SKM) algorithm and two-stage multi-scale (TSMS) feature fusion. Our proposed ConvBLS method is unprecedentedly efficient and effective.
arXiv Detail & Related papers (2023-04-01T04:16:12Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
An In-Memory Analog Computing Co-Processor for Energy-Efficient CNN Inference on Mobile Devices [4.117012092777604]
We develop an in-memory analog computing (IMAC) architecture realizing both synaptic behavior and activation functions within non-volatile memory arrays. Spin-orbit torque magnetoresistive random-access memory (SOT-MRAM) devices are leveraged to realize sigmoidal neurons as well as binarized synapses. A heterogeneous mixed-signal and mixed-precision CPU-IMAC architecture is proposed for convolutional neural networks (CNNs) inference on mobile processors.
arXiv Detail & Related papers (2021-05-24T23:01:36Z)
Hybrid In-memory Computing Architecture for the Training of Deep Neural Networks [5.050213408539571]
We propose a hybrid in-memory computing architecture for the training of deep neural networks (DNNs) on hardware accelerators. We show that HIC-based training results in about 50% less inference model size to achieve baseline comparable accuracy. Our simulations indicate HIC-based training naturally ensures that the number of write-erase cycles seen by the devices is a small fraction of the endurance limit of PCM.
arXiv Detail & Related papers (2021-02-10T05:26:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.