Related papers: A Configurable and Efficient Memory Hierarchy for Neural Network Hardware Accelerator

A Configurable and Efficient Memory Hierarchy for Neural Network Hardware Accelerator

URL: http://arxiv.org/abs/2404.15823v1
Date: Wed, 24 Apr 2024 11:57:37 GMT
Title: A Configurable and Efficient Memory Hierarchy for Neural Network Hardware Accelerator
Authors: Oliver Bause, Paul Palomero Bernardo, Oliver Bringmann,
Abstract summary: We propose a memory hierarchy framework tailored for per layer adaptive memory access patterns of deep neural networks (DNNs) The objective is to strike an optimized balance between minimizing the required memory capacity and maintaining high accelerator performance.
Score: 0.6242215470795112
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As machine learning applications continue to evolve, the demand for efficient hardware accelerators, specifically tailored for deep neural networks (DNNs), becomes increasingly vital. In this paper, we propose a configurable memory hierarchy framework tailored for per layer adaptive memory access patterns of DNNs. The hierarchy requests data on-demand from the off-chip memory to provide it to the accelerator's compute units. The objective is to strike an optimized balance between minimizing the required memory capacity and maintaining high accelerator performance. The framework is characterized by its configurability, allowing the creation of a tailored memory hierarchy with up to five levels. Furthermore, the framework incorporates an optional shift register as final level to increase the flexibility of the memory management process. A comprehensive loop-nest analysis of DNN layers shows that the framework can efficiently execute the access patterns of most loop unrolls. Synthesis results and a case study of the DNN accelerator UltraTrail indicate a possible reduction in chip area of up to 62.2% as smaller memory modules can be used. At the same time, the performance loss can be minimized to 2.4%.

Related papers

COMPASS: A Compiler Framework for Resource-Constrained Crossbar-Array Based In-Memory Deep Learning Accelerators [6.172271429579593]
We propose a compiler framework for resource-constrained crossbar-based processing-in-memory (PIM) deep neural network (DNN) accelerators. We propose an algorithm to determine the optimal partitioning that divides the layers so that each partition can be accelerated on chip.
arXiv Detail & Related papers (2025-01-12T11:31:25Z)
SpiDR: A Reconfigurable Digital Compute-in-Memory Spiking Neural Network Accelerator for Event-based Perception [8.968583287058959]
Spiking Neural Networks (SNNs) offer an efficient method for processing the asynchronous temporal data generated by Dynamic Vision Sensors (DVS) Existing SNN accelerators suffer from limitations in adaptability to diverse neuron models, bit precisions and network sizes. We propose a scalable and reconfigurable digital compute-in-memory (CIM) SNN accelerator chipname with a set of key features.
arXiv Detail & Related papers (2024-11-05T06:59:02Z)
Topology-aware Embedding Memory for Continual Learning on Expanding Networks [63.35819388164267]
We present a framework to tackle the memory explosion problem using memory replay techniques. PDGNNs with Topology-aware Embedding Memory (TEM) significantly outperform state-of-the-art techniques.
arXiv Detail & Related papers (2024-01-24T03:03:17Z)
RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on Edge [1.8293684411977293]
Deep Neural Network (DNN) based inference at the edge is challenging as these compute and data-intensive algorithms need to be implemented at low cost and low power. We present RAMAN, a Re-configurable and spArse tinyML Accelerator for infereNce on edge, architected to exploit the sparsity to reduce area (storage), power as well as latency.
arXiv Detail & Related papers (2023-06-10T17:25:58Z)
Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications. The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z)
MF-NeRF: Memory Efficient NeRF with Mixed-Feature Hash Table [62.164549651134465]
We propose MF-NeRF, a memory-efficient NeRF framework that employs a Mixed-Feature hash table to improve memory efficiency and reduce training time while maintaining reconstruction quality. Our experiments with state-of-the-art Instant-NGP, TensoRF, and DVGO, indicate our MF-NeRF could achieve the fastest training time on the same GPU hardware with similar or even higher reconstruction quality.
arXiv Detail & Related papers (2023-04-25T05:44:50Z)
FireFly: A High-Throughput Hardware Accelerator for Spiking Neural Networks with Efficient DSP and Memory Optimization [6.966706170499345]
Spiking neural networks (SNNs) have been widely used due to their strong biological interpretability and high energy efficiency. Most SNN hardware implementations for field-programmable gate arrays (FPGAs) cannot meet arithmetic or memory efficiency requirements. We propose an FPGA accelerator that can process spikes generated by the firing neuron on-the-fly (FireFly)
arXiv Detail & Related papers (2023-01-05T04:28:07Z)
Towards Memory-Efficient Neural Networks via Multi-Level in situ Generation [10.563649948220371]
Deep neural networks (DNN) have shown superior performance in a variety of tasks. As they rapidly evolve, their escalating computation and memory demands make it challenging to deploy them on resource-constrained edge devices. We propose a general and unified framework to trade expensive memory transactions with ultra-fast on-chip computations.
arXiv Detail & Related papers (2021-08-25T18:50:24Z)
MAFAT: Memory-Aware Fusing and Tiling of Neural Networks for Accelerated Edge Inference [1.7894377200944507]
Machine learning networks can easily exceed available memory, increasing latency due to excessive OS swapping. We propose a memory usage predictor coupled with a search algorithm to provide optimized fusing and tiling configurations. Results show that our approach can run in less than half the memory, and with a speedup of up to 2.78 under severe memory constraints.
arXiv Detail & Related papers (2021-07-14T19:45:49Z)
Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices. Previous unstructured or structured weight pruning methods can hardly truly accelerate inference. We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z)
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces. We train and validate our approach directly on the Intel NNP-I chip for inference. We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z)
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.