Core interface optimization for multi-core neuromorphic processors
- URL: http://arxiv.org/abs/2308.04171v1
- Date: Tue, 8 Aug 2023 10:00:14 GMT
- Title: Core interface optimization for multi-core neuromorphic processors
- Authors: Zhe Su, Hyunjung Hwang, Tristan Torchet, Giacomo Indiveri
- Abstract summary: Spiking Neural Networks (SNNs) represent a promising approach to edge-computing for applications that require low-power and low-latency.
To realize large-scale and scalable SNNs it is necessary to develop an efficient asynchronous communication and routing fabric.
- Score: 5.391889175209394
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hardware implementations of Spiking Neural Networks (SNNs) represent a
promising approach to edge-computing for applications that require low-power
and low-latency, and which cannot resort to external cloud-based computing
services. However, most solutions proposed so far either support only
relatively small networks, or take up significant hardware resources, to
implement large networks. To realize large-scale and scalable SNNs it is
necessary to develop an efficient asynchronous communication and routing fabric
that enables the design of multi-core architectures. In particular the core
interface that manages inter-core spike communication is a crucial component as
it represents the bottleneck of Power-Performance-Area (PPA) especially for the
arbitration architecture and the routing memory. In this paper we present an
arbitration mechanism with the corresponding asynchronous encoding pipeline
circuits, based on hierarchical arbiter trees. The proposed scheme reduces the
latency by more than 70% in sparse-event mode, compared to the state-of-the-art
arbitration architectures, with lower area cost. The routing memory makes use
of asynchronous Content Addressable Memory (CAM) with Current Sensing
Completion Detection (CSCD), which saves approximately 46% energy, and achieves
a 40% increase in throughput against conventional asynchronous CAM using
configurable delay lines, at the cost of only a slight increase in area. In
addition as it radically reduces the core interface resources in multi-core
neuromorphic processors, the arbitration architecture and CAM architecture we
propose can be also applied to a wide range of general asynchronous circuits
and systems.
Related papers
- AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation [48.82264764771652]
We introduce AsCAN -- a hybrid architecture, combining both convolutional and transformer blocks.
AsCAN supports a variety of tasks: recognition, segmentation, class-conditional image generation.
We then scale the same architecture to solve a large-scale text-to-image task and show state-of-the-art performance.
arXiv Detail & Related papers (2024-11-07T18:43:17Z) - RNC: Efficient RRAM-aware NAS and Compilation for DNNs on Resource-Constrained Edge Devices [0.30458577208819987]
We aim to develop edge-friendly deep neural networks (DNNs) for accelerators based on resistive random-access memory (RRAM)
We propose an edge compilation and resource-constrained RRAM-aware neural architecture search (NAS) framework to search for optimized neural networks meeting specific hardware constraints.
The resulting model from NAS optimized for speed achieved 5x-30x speedup.
arXiv Detail & Related papers (2024-09-27T15:35:36Z) - CiMNet: Towards Joint Optimization for DNN Architecture and Configuration for Compute-In-Memory Hardware [6.308771129448823]
We present CiMNet, a framework that jointly searches for optimal sub-networks and hardware configurations for compute in-memory (CiM)
The proposed framework can comprehend the complex interplay between a sub-network's performance and the CiM hardware configuration choices.
arXiv Detail & Related papers (2024-02-19T02:12:07Z) - RDRN: Recursively Defined Residual Network for Image Super-Resolution [58.64907136562178]
Deep convolutional neural networks (CNNs) have obtained remarkable performance in single image super-resolution.
We propose a novel network architecture which utilizes attention blocks efficiently.
arXiv Detail & Related papers (2022-11-17T11:06:29Z) - Bandwidth-efficient distributed neural network architectures with
application to body sensor networks [73.02174868813475]
This paper describes a conceptual design methodology to design distributed neural network architectures.
We show that the proposed framework enables up to a factor 20 in bandwidth reduction with minimal loss.
While the application focus of this paper is on wearable brain-computer interfaces, the proposed methodology can be applied in other sensor network-like applications as well.
arXiv Detail & Related papers (2022-10-14T12:35:32Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Reconfigurable co-processor architecture with limited numerical
precision to accelerate deep convolutional neural networks [0.38848561367220275]
Convolutional Neural Networks (CNNs) are widely used in deep learning applications, e.g. visual systems, robotics etc.
Here, we present a model-independent reconfigurable co-processing architecture to accelerate CNNs.
In contrast to existing solutions, we introduce limited precision 32 bit Q-format fixed point quantization for arithmetic representations and operations.
arXiv Detail & Related papers (2021-08-21T09:50:54Z) - Data Streaming and Traffic Gathering in Mesh-based NoC for Deep Neural
Network Acceleration [7.455546102930911]
We propose a modified mesh architecture with a one-way/two-way streaming bus to speedup one-to-many traffic and the use of gather packets to support many-to-one traffic.
The analysis of runtime latency of a convolutional layer shows that the two-way streaming architecture achieves better improvement than the one-way streaming architecture.
arXiv Detail & Related papers (2021-08-01T23:50:12Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - Dataflow Aware Mapping of Convolutional Neural Networks Onto Many-Core
Platforms With Network-on-Chip Interconnect [0.0764671395172401]
Machine intelligence, especially using convolutional neural networks (CNNs), has become a large area of research over the past years.
Many-core platforms consisting of several homogeneous cores can alleviate limitations with regard to physical implementation at the expense of an increased dataflow mapping effort.
This work presents an automated mapping strategy starting at the single-core level with different optimization targets for minimal runtime and minimal off-chip memory accesses.
The strategy is then extended towards a suitable many-core mapping scheme and evaluated using a scalable system-level simulation with a network-on-chip interconnect.
arXiv Detail & Related papers (2020-06-18T17:13:18Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.