Related papers: hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices

hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices

URL: http://arxiv.org/abs/2103.05579v1
Date: Tue, 9 Mar 2021 17:34:44 GMT
Title: hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices
Authors: Farah Fahim, Benjamin Hawks, Christian Herwig, James Hirschauer, Sergo Jindariani, Nhan Tran, Luca P. Carloni, Giuseppe Di Guglielmo, Philip Harris, Jeffrey Krupa, Dylan Rankin, Manuel Blanco Valentin, Josiah Hester, Yingyi Luo, John Mamish, Seda Orgrenci-Memik, Thea Aarestaad, Hamza Javed, Vladimir Loncar, Maurizio Pierini, Adrian Alan Pol, Sioni Summers, Javier Duarte, Scott Hauck, Shih-Chieh Hsu, Jennifer Ngadiuba, Mia Liu, Duc Hoang, Edward Kreinar, Zhenbin Wu
Abstract summary: In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. We have developed hls4ml, an open-source software- hardware codesign workflow to interpret and translate machine learning algorithms. We expand on previous hls4ml work by extending capabilities and techniques towards low-power implementations.
Score: 0.6353764569103648
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. To support domain scientists, we have developed hls4ml, an open-source software-hardware codesign workflow to interpret and translate machine learning algorithms for implementation with both FPGA and ASIC technologies. We expand on previous hls4ml work by extending capabilities and techniques towards low-power implementations and increased usability: new Python APIs, quantization-aware pruning, end-to-end FPGA workflows, long pipeline kernels for low power, and new device backends include an ASIC workflow. Taken together, these and continued efforts in hls4ml will arm a new generation of domain scientists with accessible, efficient, and powerful tools for machine-learning-accelerated discovery.

Related papers

ExaWorks Software Development Kit: A Robust and Scalable Collection of Interoperable Workflow Technologies [3.1805622006446397]
Heterogeneous scientific discovery increasingly requires executing on high-performance computing platforms. We contributed to addressing this issue by developing the ExaWorks Software Development Kit (SDK) The SDK is a collection of workflow technologies engineered following current best practices and specifically designed to work on HPC platforms.
arXiv Detail & Related papers (2024-07-23T17:00:09Z)
Exploring and Characterizing Large Language Models For Embedded System Development and Debugging [10.967443876391611]
Large language models (LLMs) have shown remarkable abilities to generate code, however their ability to develop software for embedded systems has not been studied. We develop an open source framework to evaluate leading LLMs to assess their capabilities and limitations for embedded system development. We leverage this finding to study how human programmers interact with these tools, and develop an human-AI based software engineering workflow for building embedded systems.
arXiv Detail & Related papers (2023-07-07T20:14:22Z)
Multi-Fidelity Active Learning with GFlowNets [65.91555804996203]
We propose a multi-fidelity active learning algorithm with GFlowNets as a sampler, to efficiently discover diverse, high-scoring candidates. Our evaluation on molecular discovery tasks shows that multi-fidelity active learning with GFlowNets can discover high-scoring candidates at a fraction of the budget of its single-fidelity counterpart.
arXiv Detail & Related papers (2023-06-20T17:43:42Z)
End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs [49.358119307844035]
We develop an end-to-end workflow for the training and implementation of co-designed neural networks (NNs) This makes efficient NN implementations in hardware accessible to nonexperts, in a single open-sourced workflow. We demonstrate the workflow in a particle physics application involving trigger decisions that must operate at the 40 MHz collision rate of the Large Hadron Collider (LHC) We implement an optimized mixed-precision NN for high-momentum particle jets in simulated LHC proton-proton collisions.
arXiv Detail & Related papers (2023-04-13T18:00:01Z)
Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark [11.575901540758574]
We present our development experience for the Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms. We use the open-source hls4ml and FINN perJ, which aim to democratize AI- hardware codesign of optimized neural networks on FPGAs. The solutions are deployed on system-on-chip (Pynq-Z2) and pure FPGA (Arty A7-100T) platforms.
arXiv Detail & Related papers (2022-06-23T15:57:17Z)
Productive Reproducible Workflows for DNNs: A Case Study for Industrial Defect Detection [0.0]
This paper presents a case study where we discuss our recent experience producing an end-to-end artificial intelligence application for industrial defect detection. We detail the high level deep learning libraries, containerized, continuous integration/deployment pipelines, and open source code templates we leveraged to produce a competitive result. We highlight the value that exploiting such systems can bring, even for research, and present our best results in terms of accuracy and inference time.
arXiv Detail & Related papers (2022-06-19T09:10:13Z)
Flashlight: Enabling Innovation in Tools for Machine Learning [50.63188263773778]
We introduce Flashlight, an open-source library built to spur innovation in machine learning tools and systems. We see Flashlight as a tool enabling research that can benefit widely used libraries downstream and bring machine learning and systems researchers closer together.
arXiv Detail & Related papers (2022-01-29T01:03:29Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs [2.2177069086277195]
CFU Playground is a full-stack open-source framework that enables rapid and iterative design of machine learning (ML) accelerators for embedded ML systems. Our tool provides a completely open-source end-to-end flow for hardware-software co-design on FPGAs and future systems research. Our rapid, deploy-profile-optimization feedback loop lets ML hardware and software developers achieve significant returns out of a relatively small investment.
arXiv Detail & Related papers (2022-01-05T23:15:58Z)
Technology Readiness Levels for Machine Learning Systems [107.56979560568232]
Development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. We have developed a proven systems engineering approach for machine learning development and deployment. Our "Machine Learning Technology Readiness Levels" framework defines a principled process to ensure robust, reliable, and responsible systems.
arXiv Detail & Related papers (2021-01-11T15:54:48Z)
Technology Readiness Levels for AI & ML [79.22051549519989]
Development of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. Engineering systems follow well-defined processes and testing standards to streamline development for high-quality, reliable results. We propose a proven systems engineering approach for machine learning development and deployment.
arXiv Detail & Related papers (2020-06-21T17:14:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.