Programmable FPGA-based Memory Controller
- URL: http://arxiv.org/abs/2108.09601v1
- Date: Sat, 21 Aug 2021 23:53:12 GMT
- Title: Programmable FPGA-based Memory Controller
- Authors: Sasindu Wijeratne, Sanket Pattnaik, Zhiyu Chen, Rajgopal Kannan,
Viktor Prasanna
- Abstract summary: This paper introduces a modular and programmable memory controller that can be configured for different target applications on available hardware resources.
The proposed memory controller efficiently supports cache-line accesses along with bulk memory transfers.
We show improved overall memory access time up to 58% on CNN and GCN workloads compared with commercial memory controller IPs.
- Score: 9.013666207570749
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Even with generational improvements in DRAM technology, memory access latency
still remains the major bottleneck for application accelerators, primarily due
to limitations in memory interface IPs which cannot fully account for
variations in target applications, the algorithms used, and accelerator
architectures. Since developing memory controllers for different applications
is time-consuming, this paper introduces a modular and programmable memory
controller that can be configured for different target applications on
available hardware resources. The proposed memory controller efficiently
supports cache-line accesses along with bulk memory transfers. The user can
configure the controller depending on the available logic resources on the
FPGA, memory access pattern, and external memory specifications. The modular
design supports various memory access optimization techniques including,
request scheduling, internal caching, and direct memory access. These
techniques contribute to reducing the overall latency while maintaining high
sustained bandwidth. We implement the system on a state-of-the-art FPGA and
evaluate its performance using two widely studied domains: graph analytics and
deep learning workloads. We show improved overall memory access time up to 58%
on CNN and GCN workloads compared with commercial memory controller IPs.
Related papers
- LiVOS: Light Video Object Segmentation with Gated Linear Matching [116.58237547253935]
LiVOS is a lightweight memory network that employs linear matching via linear attention.
For longer and higher-resolution videos, it matched STM-based methods with 53% less GPU memory and supports 4096p inference on a 32G consumer-grade GPU.
arXiv Detail & Related papers (2024-11-05T05:36:17Z) - vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving [53.972175896814505]
Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests.
Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests.
arXiv Detail & Related papers (2024-07-22T14:37:58Z) - B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory [91.81390121042192]
We develop a class of models called B'MOJO to seamlessly combine eidetic and fading memory within an composable module.
B'MOJO's ability to modulate eidetic and fading memory results in better inference on longer sequences tested up to 32K tokens.
arXiv Detail & Related papers (2024-07-08T18:41:01Z) - A Configurable and Efficient Memory Hierarchy for Neural Network Hardware Accelerator [0.6242215470795112]
We propose a memory hierarchy framework tailored for per layer adaptive memory access patterns of deep neural networks (DNNs)
The objective is to strike an optimized balance between minimizing the required memory capacity and maintaining high accelerator performance.
arXiv Detail & Related papers (2024-04-24T11:57:37Z) - SMOF: Streaming Modern CNNs on FPGAs with Smart Off-Chip Eviction [6.800641017055453]
This paper introduces weight and activation eviction mechanisms to off-chip memory along the computational pipeline.
The proposed mechanism is incorporated into an existing toolflow, expanding the design space by utilising off-chip memory as a buffer.
SMOF has demonstrated the capacity to deliver competitive and, in some cases, state-of-the-art performance across a spectrum of computer vision tasks.
arXiv Detail & Related papers (2024-03-27T18:12:24Z) - Efficient Video Object Segmentation via Modulated Cross-Attention Memory [123.12273176475863]
We propose a transformer-based approach, named MAVOS, to model temporal smoothness without requiring frequent memory expansion.
Our MAVOS achieves a J&F score of 63.3% while operating at 37 frames per second (FPS) on a single V100 GPU.
arXiv Detail & Related papers (2024-03-26T17:59:58Z) - MCUFormer: Deploying Vision Transformers on Microcontrollers with
Limited Memory [76.02294791513552]
We propose a hardware-algorithm co-optimizations method called MCUFormer to deploy vision transformers on microcontrollers with extremely limited memory.
Experimental results demonstrate that our MCUFormer achieves 73.62% top-1 accuracy on ImageNet for image classification with 320KB memory.
arXiv Detail & Related papers (2023-10-25T18:00:26Z) - Reconfigurable Low-latency Memory System for Sparse Matricized Tensor
Times Khatri-Rao Product on FPGA [3.4870723728779565]
Sparse Matricized Times Khatri-Rao Product (MTTKRP) is one of the most expensive kernels in tensor computations.
This paper focuses on a multi-faceted memory system, which explores the spatial and temporal locality of the data structures of MTTKRP.
Our system shows 2x and 1.26x speedups compared with cache-only and DMA-only memory systems, respectively.
arXiv Detail & Related papers (2021-09-18T08:19:29Z) - PIM-DRAM:Accelerating Machine Learning Workloads using Processing in
Memory based on DRAM Technology [2.6168147530506958]
We propose a processing-in-memory (PIM) multiplication primitive to accelerate matrix vector operations in ML workloads.
We show that the proposed architecture, mapping, and data flow can provide up to 23x and 6.5x benefits over a GPU.
arXiv Detail & Related papers (2021-05-08T16:39:24Z) - Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling.
Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z) - In-memory Implementation of On-chip Trainable and Scalable ANN for AI/ML
Applications [0.0]
This paper presents an in-memory computing architecture for ANN enabling artificial intelligence (AI) and machine learning (ML) applications.
Our novel on-chip training and inference in-memory architecture reduces energy cost and enhances throughput by simultaneously accessing the multiple rows of array per precharge cycle.
The proposed architecture was trained and tested on the IRIS dataset which exhibits $46times$ more energy efficient per MAC (multiply and accumulate) operation compared to earlier classifiers.
arXiv Detail & Related papers (2020-05-19T15:36:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.