Related papers: Core interface optimization for multi-core neuromorphic processors

Core interface optimization for multi-core neuromorphic processors

URL: http://arxiv.org/abs/2308.04171v1
Date: Tue, 8 Aug 2023 10:00:14 GMT
Title: Core interface optimization for multi-core neuromorphic processors
Authors: Zhe Su, Hyunjung Hwang, Tristan Torchet, Giacomo Indiveri
Abstract summary: Spiking Neural Networks (SNNs) represent a promising approach to edge-computing for applications that require low-power and low-latency. To realize large-scale and scalable SNNs it is necessary to develop an efficient asynchronous communication and routing fabric.
Score: 5.391889175209394
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hardware implementations of Spiking Neural Networks (SNNs) represent a promising approach to edge-computing for applications that require low-power and low-latency, and which cannot resort to external cloud-based computing services. However, most solutions proposed so far either support only relatively small networks, or take up significant hardware resources, to implement large networks. To realize large-scale and scalable SNNs it is necessary to develop an efficient asynchronous communication and routing fabric that enables the design of multi-core architectures. In particular the core interface that manages inter-core spike communication is a crucial component as it represents the bottleneck of Power-Performance-Area (PPA) especially for the arbitration architecture and the routing memory. In this paper we present an arbitration mechanism with the corresponding asynchronous encoding pipeline circuits, based on hierarchical arbiter trees. The proposed scheme reduces the latency by more than 70% in sparse-event mode, compared to the state-of-the-art arbitration architectures, with lower area cost. The routing memory makes use of asynchronous Content Addressable Memory (CAM) with Current Sensing Completion Detection (CSCD), which saves approximately 46% energy, and achieves a 40% increase in throughput against conventional asynchronous CAM using configurable delay lines, at the cost of only a slight increase in area. In addition as it radically reduces the core interface resources in multi-core neuromorphic processors, the arbitration architecture and CAM architecture we propose can be also applied to a wide range of general asynchronous circuits and systems.

Related papers

An Efficient Multicast Addressing Encoding Scheme for Multi-Core Neuromorphic Processors [4.251655740279756]
Multi-core neuromorphic processors are becoming increasingly significant due to their energy-efficient local computing and scalable modular architecture. We propose a hierarchical bit string encoding scheme that expands the addressing capability of state-of-the-art symbol-based schemes for the same number of routing bits. When put at work with a real neuromorphic task, this hierarchical bit string encoding achieves a reduction in area cost by approximately 29% and decreases energy consumption by about 50%.
arXiv Detail & Related papers (2024-11-18T13:04:38Z)
AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation [48.82264764771652]
We introduce AsCAN -- a hybrid architecture, combining both convolutional and transformer blocks. AsCAN supports a variety of tasks: recognition, segmentation, class-conditional image generation. We then scale the same architecture to solve a large-scale text-to-image task and show state-of-the-art performance.
arXiv Detail & Related papers (2024-11-07T18:43:17Z)
CiMNet: Towards Joint Optimization for DNN Architecture and Configuration for Compute-In-Memory Hardware [6.308771129448823]
We present CiMNet, a framework that jointly searches for optimal sub-networks and hardware configurations for compute in-memory (CiM) The proposed framework can comprehend the complex interplay between a sub-network's performance and the CiM hardware configuration choices.
arXiv Detail & Related papers (2024-02-19T02:12:07Z)
RDRN: Recursively Defined Residual Network for Image Super-Resolution [58.64907136562178]
Deep convolutional neural networks (CNNs) have obtained remarkable performance in single image super-resolution. We propose a novel network architecture which utilizes attention blocks efficiently.
arXiv Detail & Related papers (2022-11-17T11:06:29Z)
Bandwidth-efficient distributed neural network architectures with application to body sensor networks [73.02174868813475]
This paper describes a conceptual design methodology to design distributed neural network architectures. We show that the proposed framework enables up to a factor 20 in bandwidth reduction with minimal loss. While the application focus of this paper is on wearable brain-computer interfaces, the proposed methodology can be applied in other sensor network-like applications as well.
arXiv Detail & Related papers (2022-10-14T12:35:32Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Reconfigurable co-processor architecture with limited numerical precision to accelerate deep convolutional neural networks [0.38848561367220275]
Convolutional Neural Networks (CNNs) are widely used in deep learning applications, e.g. visual systems, robotics etc. Here, we present a model-independent reconfigurable co-processing architecture to accelerate CNNs. In contrast to existing solutions, we introduce limited precision 32 bit Q-format fixed point quantization for arithmetic representations and operations.
arXiv Detail & Related papers (2021-08-21T09:50:54Z)
Data Streaming and Traffic Gathering in Mesh-based NoC for Deep Neural Network Acceleration [7.455546102930911]
We propose a modified mesh architecture with a one-way/two-way streaming bus to speedup one-to-many traffic and the use of gather packets to support many-to-one traffic. The analysis of runtime latency of a convolutional layer shows that the two-way streaming architecture achieves better improvement than the one-way streaming architecture.
arXiv Detail & Related papers (2021-08-01T23:50:12Z)
Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks. specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples. We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z)
Dataflow Aware Mapping of Convolutional Neural Networks Onto Many-Core Platforms With Network-on-Chip Interconnect [0.0764671395172401]
Machine intelligence, especially using convolutional neural networks (CNNs), has become a large area of research over the past years. Many-core platforms consisting of several homogeneous cores can alleviate limitations with regard to physical implementation at the expense of an increased dataflow mapping effort. This work presents an automated mapping strategy starting at the single-core level with different optimization targets for minimal runtime and minimal off-chip memory accesses. The strategy is then extended towards a suitable many-core mapping scheme and evaluated using a scalable system-level simulation with a network-on-chip interconnect.
arXiv Detail & Related papers (2020-06-18T17:13:18Z)
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.