A Design Flow for Mapping Spiking Neural Networks to Many-Core
Neuromorphic Hardware
- URL: http://arxiv.org/abs/2108.12444v1
- Date: Fri, 27 Aug 2021 18:08:08 GMT
- Title: A Design Flow for Mapping Spiking Neural Networks to Many-Core
Neuromorphic Hardware
- Authors: Shihao Song, M. Lakshmi Varshika, Anup Das, and Nagarajan Kandasamy
- Abstract summary: Many-core neuromorphic hardware is expected to execute large machine learning models.
To deal with the design complexity, a predictable design flow is needed to guarantee real-time performance.
We propose an SDFG-based design flow for mapping spiking neural networks to many-core neuromorphic hardware.
- Score: 4.527975416669432
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The design of many-core neuromorphic hardware is getting more and more
complex as these systems are expected to execute large machine learning models.
To deal with the design complexity, a predictable design flow is needed to
guarantee real-time performance such as latency and throughput without
significantly increasing the buffer requirement of computing cores. Synchronous
Data Flow Graphs (SDFGs) are used for predictable mapping of streaming
applications to multiprocessor systems. We propose an SDFG-based design flow
for mapping spiking neural networks (SNNs) to many-core neuromorphic hardware
with the objective of exploring the tradeoff between throughput and buffer
size. The proposed design flow integrates an iterative partitioning approach,
based on Kernighan-Lin graph partitioning heuristic, creating SNN clusters such
that each cluster can be mapped to a core of the hardware. The partitioning
approach minimizes the inter-cluster spike communication, which improves
latency on the shared interconnect of the hardware. Next, the design flow maps
clusters to cores using an instance of the Particle Swarm Optimization (PSO),
an evolutionary algorithm, exploring the design space of throughput and buffer
size. Pareto optimal mappings are retained from the design flow, allowing
system designers to select a Pareto mapping that satisfies throughput and
buffer size requirements of the design. We evaluated the design flow using five
large-scale convolutional neural network (CNN) models. Results demonstrate 63%
higher maximum throughput and 10% lower buffer size requirement compared to
state-of-the-art dataflow-based mapping solutions.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - An FPGA-Based Accelerator Enabling Efficient Support for CNNs with
Arbitrary Kernel Sizes [11.681245043617848]
Convolutional neural networks (CNNs) with large kernels have demonstrated impressive performance in various vision-based applications.
An FPGA-based inference accelerator is proposed for the efficient deployment of CNNs with arbitrary kernel sizes.
The proposed hardware accelerator, evaluated on Intel Arria 10 FPGA, achieves up to 3.91 times better DSP efficiency than prior art on the same network.
arXiv Detail & Related papers (2024-02-22T05:52:55Z) - Mixed-TD: Efficient Neural Network Accelerator with Layer-Specific
Tensor Decomposition [7.221206118679026]
We propose a framework for mapping CNNs onto FPGAs based on a novel tensor decomposition method called Mixed-TD.
The proposed method applies layer-specific Singular Value Decomposition (SVD) and Canonical Polyadic Decomposition (CPD) in a mixed manner, achieving 1.73x to 10.29x throughput per DSP to state-of-the-art CNNs.
arXiv Detail & Related papers (2023-06-08T08:16:38Z) - Reconfigurable Distributed FPGA Cluster Design for Deep Learning
Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications.
The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z) - FlowNAS: Neural Architecture Search for Optical Flow Estimation [65.44079917247369]
We propose a neural architecture search method named FlowNAS to automatically find the better encoder architecture for flow estimation task.
Experimental results show that the discovered architecture with the weights inherited from the super-network achieves 4.67% F1-all error on KITTI.
arXiv Detail & Related papers (2022-07-04T09:05:25Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - N3H-Core: Neuron-designed Neural Network Accelerator via FPGA-based
Heterogeneous Computing Cores [26.38812379700231]
We develop an FPGA-based heterogeneous computing system for neural network acceleration.
The proposed accelerator consists of DSP- and LUT-based GEneral Matrix-Multiplication (GEMM) computing cores.
Our design outperforms the state-of-the-art Mix&Match design with latency reduced by 1.12-1.32x with higher inference accuracy.
arXiv Detail & Related papers (2021-12-15T15:12:00Z) - Dataflow Aware Mapping of Convolutional Neural Networks Onto Many-Core
Platforms With Network-on-Chip Interconnect [0.0764671395172401]
Machine intelligence, especially using convolutional neural networks (CNNs), has become a large area of research over the past years.
Many-core platforms consisting of several homogeneous cores can alleviate limitations with regard to physical implementation at the expense of an increased dataflow mapping effort.
This work presents an automated mapping strategy starting at the single-core level with different optimization targets for minimal runtime and minimal off-chip memory accesses.
The strategy is then extended towards a suitable many-core mapping scheme and evaluated using a scalable system-level simulation with a network-on-chip interconnect.
arXiv Detail & Related papers (2020-06-18T17:13:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.