Related papers: Exploring FPGA designs for MX and beyond

Exploring FPGA designs for MX and beyond

URL: http://arxiv.org/abs/2407.01475v1
Date: Mon, 1 Jul 2024 17:07:33 GMT
Title: Exploring FPGA designs for MX and beyond
Authors: Ebby Samson, Naveen Mellempudi, Wayne Luk, George A. Constantinides,
Abstract summary: We describe and evaluate the first open-source FPGA implementation of the arithmetic defined in the Open Compute Project MX standard. Our designs fully support all the standard's concrete formats for conversion into and out of MX formats. We release an open-source Pytorch library for quantization into the new standard, integrated with the Brevitas library.
Score: 6.843913224130847
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A number of companies recently worked together to release the new Open Compute Project MX standard for low-precision computation, aimed at efficient neural network implementation. In this paper, we describe and evaluate the first open-source FPGA implementation of the arithmetic defined in the standard. Our designs fully support all the standard's concrete formats for conversion into and out of MX formats and for the standard-defined arithmetic operations, as well as arbitrary fixed-point and floating-point formats. Certain elements of the standard are left as implementation-defined, and we present the first concrete FPGA-inspired choices for these elements, which we outline in the paper. Our library of optimized hardware components is available open source, and can be used to build larger systems. For this purpose, we also describe and release an open-source Pytorch library for quantization into the new standard, integrated with the Brevitas library so that the community can develop novel neural network designs quantized with MX formats in mind. We demonstrate the usability and efficacy of our libraries via the implementation of example neural networks such as ResNet-18 on the ImageNet ILSVRC12 dataset. Our testing shows that MX is very effective for formats such as INT5 or FP6 which are not natively supported on GPUs. This gives FPGAs an advantage as they have the flexibility to implement a custom datapath and take advantage of the smaller area footprints offered by these formats.

Related papers

Runtime Tunable Tsetlin Machines for Edge Inference on eFPGAs [0.2294388534633318]
eFPGAs allow for the design of hardware accelerators of edge Machine Learning (ML) applications at a lower power budget. The limited eFPGA logic and memory significantly constrain compute capabilities and model size. The proposed eFPGA accelerator focuses on minimizing resource usage and allowing flexibility for on-field recalibration over throughput.
arXiv Detail & Related papers (2025-02-10T12:49:22Z)
if-ZKP: Intel FPGA-Based Acceleration of Zero Knowledge Proofs [3.0009885036586725]
We present a novel scalable architecture that is suitable for accelerating the zk-SNARK prover compute on FPGAs. We focus on the multi-scalar multiplication (MSM) that accounts for the majority of time spent in zk-SNARK systems. Our implementation runs 110x-150x faster compared to reference software library.
arXiv Detail & Related papers (2024-12-17T02:35:32Z)
Investigating Resource-efficient Neutron/Gamma Classification ML Models Targeting eFPGAs [0.0]
Open-source embedded FPGA (eFPGA) frameworks provide an alternate, more flexible pathway for implementing machine learning models in hardware. We explore the parameter space for eFPGA implementations of fully-connected neural network (fcNN) and boosted decision tree (BDT) models. The results of the study will be used to aid the specification of an eFPGA fabric, which will be integrated as part of a test chip.
arXiv Detail & Related papers (2024-04-19T20:03:30Z)
UniSparse: An Intermediate Language for General Sparse Format Customization [13.132033187592349]
We propose UniSparse, an intermediate language that provides a unified abstraction for representing and customizing sparse formats. Unlike the existing attribute-based frameworks, UniSparse decouples the logical representation of the sparse tensor from its low-level memory layout. As a result, a rich set of format customizations can be succinctly expressed in a small set of well-defined query, mutation, and layout primitives.
arXiv Detail & Related papers (2024-03-09T05:38:45Z)
torchgfn: A PyTorch GFlowNet library [56.071033896777784]
torchgfn is a PyTorch library that aims to address this need. It provides users with a simple API for environments and useful abstractions for samplers and losses.
arXiv Detail & Related papers (2023-05-24T00:20:59Z)
SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation [100.89770978711464]
We present SegNeXt, a simple convolutional network architecture for semantic segmentation. We show that convolutional attention is a more efficient and effective way to encode contextual information than the self-attention mechanism in transformers.
arXiv Detail & Related papers (2022-09-18T14:33:49Z)
Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark [11.575901540758574]
We present our development experience for the Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms. We use the open-source hls4ml and FINN perJ, which aim to democratize AI- hardware codesign of optimized neural networks on FPGAs. The solutions are deployed on system-on-chip (Pynq-Z2) and pure FPGA (Arty A7-100T) platforms.
arXiv Detail & Related papers (2022-06-23T15:57:17Z)
GRecX: An Efficient and Unified Benchmark for GNN-based Recommendation [55.55523188090938]
We present GRecX, an open-source framework for benchmarking GNN-based recommendation models. GRecX consists of core libraries for building GNN-based recommendation benchmarks, as well as the implementations of popular GNN-based recommendation models. We conduct experiments with GRecX, and the experimental results show that GRecX allows us to train and benchmark GNN-based recommendation baselines in an efficient and unified way.
arXiv Detail & Related papers (2021-11-19T17:45:46Z)
A fully pipelined FPGA accelerator for scale invariant feature transform keypoint descriptor matching, [0.0]
We design a novel fully pipelined hardware accelerator architecture for SIFT keypoint descriptor matching. The proposed hardware architecture is able to properly handle the memory bandwidth necessary for a fully-pipelined implementation. Our hardware implementation is 15.7 times faster than the comparable software approach.
arXiv Detail & Related papers (2020-12-17T15:29:41Z)
Fully Convolutional Networks for Panoptic Segmentation [91.84686839549488]
We present a conceptually simple, strong, and efficient framework for panoptic segmentation, called Panoptic FCN. Our approach aims to represent and predict foreground things and background stuff in a unified fully convolutional pipeline. Panoptic FCN encodes each object instance or stuff category into a specific kernel weight with the proposed kernel generator.
arXiv Detail & Related papers (2020-12-01T18:31:41Z)
CNN2Gate: Toward Designing a General Framework for Implementation of Convolutional Neural Networks on FPGA [0.3655021726150368]
This paper introduces an integrated framework that supports compilation of a CNN model for an FPGA target. CNN2Gate exploits the OpenCL synthesis workflow for FPGAs offered by commercial vendors. This paper reports results of automatic synthesis and design-space exploration of AlexNet and VGG-16 on various Intel FPGA platforms.
arXiv Detail & Related papers (2020-04-06T01:57:53Z)
Benchmarking Graph Neural Networks [75.42159546060509]
Graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. For any successful field to become mainstream and reliable, benchmarks must be developed to quantify progress. GitHub repository has reached 1,800 stars and 339 forks, which demonstrates the utility of the proposed open-source framework.
arXiv Detail & Related papers (2020-03-02T15:58:46Z)
Neural Network Compression Framework for fast model inference [59.65531492759006]
We present a new framework for neural networks compression with fine-tuning, which we called Neural Network Compression Framework (NNCF) It leverages recent advances of various network compression methods and implements some of them, such as sparsity, quantization, and binarization. The framework can be used within the training samples, which are supplied with it, or as a standalone package that can be seamlessly integrated into the existing training code.
arXiv Detail & Related papers (2020-02-20T11:24:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.