Related papers: PEFSL: A deployment Pipeline for Embedded Few-Shot Learning on a FPGA SoC

PEFSL: A deployment Pipeline for Embedded Few-Shot Learning on a FPGA SoC

URL: http://arxiv.org/abs/2404.19354v1
Date: Tue, 30 Apr 2024 08:33:52 GMT
Title: PEFSL: A deployment Pipeline for Embedded Few-Shot Learning on a FPGA SoC
Authors: Lucas Grativol Ribeiro, Lubin Gauthier, Mathieu Leonardon, Jérémy Morlier, Antoine Lavrard-Meyer, Guillaume Muller, Virginie Fresse, Matthieu Arzel,
Abstract summary: We develop an end-to-end open-source pipeline for a few-shot learning platform for object classification on a FPGA system. We build and deploy a low-power, low-latency demonstrator trained on the MiniImageNet dataset with a dataflow architecture. The proposed system has a latency of 30 ms while consuming 6.2 W on the PYNQ-Z1 board.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper tackles the challenges of implementing few-shot learning on embedded systems, specifically FPGA SoCs, a vital approach for adapting to diverse classification tasks, especially when the costs of data acquisition or labeling prove to be prohibitively high. Our contributions encompass the development of an end-to-end open-source pipeline for a few-shot learning platform for object classification on a FPGA SoCs. The pipeline is built on top of the Tensil open-source framework, facilitating the design, training, evaluation, and deployment of DNN backbones tailored for few-shot learning. Additionally, we showcase our work's potential by building and deploying a low-power, low-latency demonstrator trained on the MiniImageNet dataset with a dataflow architecture. The proposed system has a latency of 30 ms while consuming 6.2 W on the PYNQ-Z1 board.

Related papers

Hardware-Aware Feature Extraction Quantisation for Real-Time Visual Odometry on FPGA Platforms [0.0]
We propose an embedded implementation of an unsupervised architecture capable of detecting and describing feature points.<n>We implemented the solution on an FPGA System-on-Chip (SoC) platform, specifically the AMD/Xilinx Zynq UltraScale+.<n>This allowed us to process 640 x 480 pixel images at up to 54 fps, outperforming state-of-the-art solutions in the field.
arXiv Detail & Related papers (2025-07-10T16:37:20Z)
fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence [50.417261057533786]
fVDB is a novel framework for deep learning on large-scale 3D data. Our framework is fully integrated with PyTorch enabling interoperability with existing pipelines.
arXiv Detail & Related papers (2024-07-01T20:20:33Z)
Harnessing FPGA Technology for Enhanced Biomedical Computation [0.0]
This research delves into sophisticated neural network frameworks like CNN, Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTMs), and Deep Belief Networks (DBNs) By evaluating performance indicators like latency and throughput, we showcase the efficacy of FPGAs in advanced biomedical computing.
arXiv Detail & Related papers (2023-11-21T08:51:58Z)
Exploiting FPGA Capabilities for Accelerated Biomedical Computing [0.0]
This study presents advanced neural network architectures for enhanced ECG signal analysis using Field Programmable Gate Arrays (FPGAs) We utilize the MIT-BIH Arrhythmia Database for training and validation, introducing Gaussian noise to improve robustness. The study ultimately offers a guide for optimizing neural network performance on FPGAs for various applications.
arXiv Detail & Related papers (2023-07-16T01:20:17Z)
Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications. The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z)
Learning with Multigraph Convolutional Filters [153.20329791008095]
We introduce multigraph convolutional neural networks (MGNNs) as stacked and layered structures where information is processed according to an MSP model. We also develop a procedure for tractable computation of filter coefficients in the MGNNs and a low cost method to reduce the dimensionality of the information transferred between layers.
arXiv Detail & Related papers (2022-10-28T17:00:50Z)
Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark [11.575901540758574]
We present our development experience for the Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms. We use the open-source hls4ml and FINN perJ, which aim to democratize AI- hardware codesign of optimized neural networks on FPGAs. The solutions are deployed on system-on-chip (Pynq-Z2) and pure FPGA (Arty A7-100T) platforms.
arXiv Detail & Related papers (2022-06-23T15:57:17Z)
Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference [74.80730361332711]
Few-shot learning is an important and topical problem in computer vision. We show that a simple transformer-based pipeline yields surprisingly good performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-15T02:55:58Z)
Deep Structured Instance Graph for Distilling Object Detectors [82.16270736573176]
We present a simple knowledge structure to exploit and encode information inside the detection system to facilitate detector knowledge distillation. We achieve new state-of-the-art results on the challenging COCO object detection task with diverse student-teacher pairs on both one- and two-stage detectors.
arXiv Detail & Related papers (2021-09-27T08:26:00Z)
Performance Analysis of Deep Learning Workloads on a Composable System [0.08388591755871731]
Composable infrastructure is defined as resources, such as compute, storage, accelerators and networking, that are shared in a pool. This paper details the design of an enterprise composable infrastructure that we have implemented and made available to our partners in the IBM Research AI Hardware Center.
arXiv Detail & Related papers (2021-03-19T17:15:42Z)
ResNeSt: Split-Attention Networks [86.25490825631763]
We present a modularized architecture, which applies the channel-wise attention on different network branches to leverage their success in capturing cross-feature interactions and learning diverse representations. Our model, named ResNeSt, outperforms EfficientNet in accuracy and latency trade-off on image classification.
arXiv Detail & Related papers (2020-04-19T20:40:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.