Programmable FPGA-based Memory Controller
- URL: http://arxiv.org/abs/2108.09601v1
- Date: Sat, 21 Aug 2021 23:53:12 GMT
- Title: Programmable FPGA-based Memory Controller
- Authors: Sasindu Wijeratne, Sanket Pattnaik, Zhiyu Chen, Rajgopal Kannan,
Viktor Prasanna
- Abstract summary: This paper introduces a modular and programmable memory controller that can be configured for different target applications on available hardware resources.
The proposed memory controller efficiently supports cache-line accesses along with bulk memory transfers.
We show improved overall memory access time up to 58% on CNN and GCN workloads compared with commercial memory controller IPs.
- Score: 9.013666207570749
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Even with generational improvements in DRAM technology, memory access latency
still remains the major bottleneck for application accelerators, primarily due
to limitations in memory interface IPs which cannot fully account for
variations in target applications, the algorithms used, and accelerator
architectures. Since developing memory controllers for different applications
is time-consuming, this paper introduces a modular and programmable memory
controller that can be configured for different target applications on
available hardware resources. The proposed memory controller efficiently
supports cache-line accesses along with bulk memory transfers. The user can
configure the controller depending on the available logic resources on the
FPGA, memory access pattern, and external memory specifications. The modular
design supports various memory access optimization techniques including,
request scheduling, internal caching, and direct memory access. These
techniques contribute to reducing the overall latency while maintaining high
sustained bandwidth. We implement the system on a state-of-the-art FPGA and
evaluate its performance using two widely studied domains: graph analytics and
deep learning workloads. We show improved overall memory access time up to 58%
on CNN and GCN workloads compared with commercial memory controller IPs.
Related papers
- COMPASS: A Compiler Framework for Resource-Constrained Crossbar-Array Based In-Memory Deep Learning Accelerators [6.172271429579593]
We propose a compiler framework for resource-constrained crossbar-based processing-in-memory (PIM) deep neural network (DNN) accelerators.
We propose an algorithm to determine the optimal partitioning that divides the layers so that each partition can be accelerated on chip.
arXiv Detail & Related papers (2025-01-12T11:31:25Z) - LiVOS: Light Video Object Segmentation with Gated Linear Matching [116.58237547253935]
LiVOS is a lightweight memory network that employs linear matching via linear attention.
For longer and higher-resolution videos, it matched STM-based methods with 53% less GPU memory and supports 4096p inference on a 32G consumer-grade GPU.
arXiv Detail & Related papers (2024-11-05T05:36:17Z) - vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving [53.972175896814505]
Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests.
Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests.
arXiv Detail & Related papers (2024-07-22T14:37:58Z) - B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory [91.81390121042192]
We develop a class of models called B'MOJO to seamlessly combine eidetic and fading memory within an composable module.
B'MOJO's ability to modulate eidetic and fading memory results in better inference on longer sequences tested up to 32K tokens.
arXiv Detail & Related papers (2024-07-08T18:41:01Z) - vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention [8.20523619534105]
PagedAttention is a popular approach for dynamic memory allocation in LLM serving systems.
We present vAttention -- an approach that mitigates fragmentation in physical memory while retaining the contiguity of KV cache in virtual memory.
Overall, vAttention is a simpler, portable, and performant alternative to PagedAttention.
arXiv Detail & Related papers (2024-05-07T16:00:32Z) - A Configurable and Efficient Memory Hierarchy for Neural Network Hardware Accelerator [0.6242215470795112]
We propose a memory hierarchy framework tailored for per layer adaptive memory access patterns of deep neural networks (DNNs)
The objective is to strike an optimized balance between minimizing the required memory capacity and maintaining high accelerator performance.
arXiv Detail & Related papers (2024-04-24T11:57:37Z) - Efficient Video Object Segmentation via Modulated Cross-Attention Memory [123.12273176475863]
We propose a transformer-based approach, named MAVOS, to model temporal smoothness without requiring frequent memory expansion.
Our MAVOS achieves a J&F score of 63.3% while operating at 37 frames per second (FPS) on a single V100 GPU.
arXiv Detail & Related papers (2024-03-26T17:59:58Z) - MCUFormer: Deploying Vision Transformers on Microcontrollers with
Limited Memory [76.02294791513552]
We propose a hardware-algorithm co-optimizations method called MCUFormer to deploy vision transformers on microcontrollers with extremely limited memory.
Experimental results demonstrate that our MCUFormer achieves 73.62% top-1 accuracy on ImageNet for image classification with 320KB memory.
arXiv Detail & Related papers (2023-10-25T18:00:26Z) - Reconfigurable Low-latency Memory System for Sparse Matricized Tensor
Times Khatri-Rao Product on FPGA [3.4870723728779565]
Sparse Matricized Times Khatri-Rao Product (MTTKRP) is one of the most expensive kernels in tensor computations.
This paper focuses on a multi-faceted memory system, which explores the spatial and temporal locality of the data structures of MTTKRP.
Our system shows 2x and 1.26x speedups compared with cache-only and DMA-only memory systems, respectively.
arXiv Detail & Related papers (2021-09-18T08:19:29Z) - PIM-DRAM:Accelerating Machine Learning Workloads using Processing in
Memory based on DRAM Technology [2.6168147530506958]
We propose a processing-in-memory (PIM) multiplication primitive to accelerate matrix vector operations in ML workloads.
We show that the proposed architecture, mapping, and data flow can provide up to 23x and 6.5x benefits over a GPU.
arXiv Detail & Related papers (2021-05-08T16:39:24Z) - Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling.
Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.