Related papers: CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs

CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs

URL: http://arxiv.org/abs/2201.01863v3
Date: Wed, 5 Apr 2023 17:45:43 GMT
Title: CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs
Authors: Shvetank Prakash, Tim Callahan, Joseph Bushagour, Colby Banbury, Alan V. Green, Pete Warden, Tim Ansell, Vijay Janapa Reddi
Abstract summary: CFU Playground is a full-stack open-source framework that enables rapid and iterative design of machine learning (ML) accelerators for embedded ML systems. Our tool provides a completely open-source end-to-end flow for hardware-software co-design on FPGAs and future systems research. Our rapid, deploy-profile-optimization feedback loop lets ML hardware and software developers achieve significant returns out of a relatively small investment.
Score: 2.2177069086277195
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Need for the efficient processing of neural networks has given rise to the development of hardware accelerators. The increased adoption of specialized hardware has highlighted the need for more agile design flows for hardware-software co-design and domain-specific optimizations. In this paper, we present CFU Playground: a full-stack open-source framework that enables rapid and iterative design and evaluation of machine learning (ML) accelerators for embedded ML systems. Our tool provides a completely open-source end-to-end flow for hardware-software co-design on FPGAs and future systems research. This full-stack framework gives the users access to explore experimental and bespoke architectures that are customized and co-optimized for embedded ML. Our rapid, deploy-profile-optimization feedback loop lets ML hardware and software developers achieve significant returns out of a relatively small investment in customization. Using CFU Playground's design and evaluation loop, we show substantial speedups between 55$\times$ and 75$\times$. The soft CPU coupled with the accelerator opens up a new, rich design space between the two components that we explore in an automated fashion using Vizier, an open-source black-box optimization service.

Related papers

EVEv2: Improved Baselines for Encoder-Free Vision-Language Models [72.07868838411474]
Existing encoder-free vision-language models (VLMs) are narrowing the performance gap with their encoder-based counterparts. We develop efficient strategies for encoder-free VLMs that rival mainstream encoder-based ones. We show that properly and hierarchically associating vision and language within a unified model reduces interference between modalities.
arXiv Detail & Related papers (2025-02-10T18:59:58Z)
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design [59.00758127310582]
We propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models. Our approach employs activation sparsity to extract experts. Read-ME outperforms other popular open-source dense models of similar scales.
arXiv Detail & Related papers (2024-10-24T19:48:51Z)
LLM-Aided Compilation for Tensor Accelerators [6.709490736813537]
We discuss how large language models (LLMs) could be leveraged to build a compiler for hardware accelerators. Specifically, we demonstrate the ability of GPT-4 to achieve high pass rates in translating code to the Gemmini accelerator. We also propose a 2-phase workflow for utilizing LLMs to generate hardware-optimized code.
arXiv Detail & Related papers (2024-08-06T19:10:25Z)
Enabling more efficient and cost-effective AI/ML systems with Collective Mind, virtualized MLOps, MLPerf, Collective Knowledge Playground and reproducible optimization tournaments [0.09065034043031665]
I present my community effort to automatically co-design cheaper, faster and more energy-efficient workloads for AI, ML and other popular workloads. I developed CM to modularize, automate and virtualize the tedious process of building, running, profiling and optimizing complex applications across rapidly evolving open-source and proprietary AI/ML models, datasets, software and hardware. I donated CM and CM4MLOps to help connect academia and industry to learn how to build and run AI and other emerging workloads in the most efficient and cost-effective way.
arXiv Detail & Related papers (2024-06-24T16:55:03Z)
Using the Abstract Computer Architecture Description Language to Model AI Hardware Accelerators [77.89070422157178]
Manufacturers of AI-integrated products face a critical challenge: selecting an accelerator that aligns with their product's performance requirements. The Abstract Computer Architecture Description Language (ACADL) is a concise formalization of computer architecture block diagrams. In this paper, we demonstrate how to use the ACADL to model AI hardware accelerators, use their ACADL description to map DNNs onto them, and explain the timing simulation semantics to gather performance results.
arXiv Detail & Related papers (2024-01-30T19:27:16Z)
DEAP: Design Space Exploration for DNN Accelerator Parallelism [0.0]
Large Language Models (LLMs) are becoming increasingly complex and powerful to train and serve. This paper showcases how hardware and software co-design can come together and allow us to create customized hardware systems.
arXiv Detail & Related papers (2023-12-24T02:43:01Z)
Verilog-to-PyG -- A Framework for Graph Learning and Augmentation on RTL Designs [15.67829950106923]
We introduce an innovative open-source framework that translates RTL designs into graph representation foundations. The Verilog-to-PyG (V2PYG) framework is compatible with the open-source Electronic Design Automation (EDA) toolchain OpenROAD. We will present novel RTL data augmentation methods that enable functional equivalent design augmentation for the construction of an extensive graph-based RTL design database.
arXiv Detail & Related papers (2023-11-09T20:11:40Z)
Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark [11.575901540758574]
We present our development experience for the Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms. We use the open-source hls4ml and FINN perJ, which aim to democratize AI- hardware codesign of optimized neural networks on FPGAs. The solutions are deployed on system-on-chip (Pynq-Z2) and pure FPGA (Arty A7-100T) platforms.
arXiv Detail & Related papers (2022-06-23T15:57:17Z)
Flashlight: Enabling Innovation in Tools for Machine Learning [50.63188263773778]
We introduce Flashlight, an open-source library built to spur innovation in machine learning tools and systems. We see Flashlight as a tool enabling research that can benefit widely used libraries downstream and bring machine learning and systems researchers closer together.
arXiv Detail & Related papers (2022-01-29T01:03:29Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
Does Form Follow Function? An Empirical Exploration of the Impact of Deep Neural Network Architecture Design on Hardware-Specific Acceleration [76.35307867016336]
This study investigates the impact of deep neural network architecture design on the degree of inference speedup. We show that while leveraging hardware-specific acceleration achieved an average inference speed-up of 380%, the degree of inference speed-up varied drastically depending on the macro-architecture design pattern.
arXiv Detail & Related papers (2021-07-08T23:05:39Z)
VEGA: Towards an End-to-End Configurable AutoML Pipeline [101.07003005736719]
VEGA is an efficient and comprehensive AutoML framework that is compatible and optimized for multiple hardware platforms. VEGA can improve the existing AutoML algorithms and discover new high-performance models against SOTA methods.
arXiv Detail & Related papers (2020-11-03T06:53:53Z)
Learned Hardware/Software Co-Design of Neural Accelerators [20.929918108940093]
Deep learning software stacks and hardware accelerators are diverse and vast. Prior work considers software optimizations separately from hardware architectures, effectively reducing the search space. This paper casts the problem as hardware/software co-design, with the goal of automatically identifying desirable points in the joint design space.
arXiv Detail & Related papers (2020-10-05T15:12:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.