Related papers: HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on FPGA Devices

HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on FPGA Devices

URL: http://arxiv.org/abs/2303.17218v6
Date: Mon, 29 May 2023 11:23:45 GMT
Title: HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on FPGA Devices
Authors: Petros Toupas, Alexander Montgomerie-Corcoran, Christos-Savvas Bouganis, Dimitrios Tzovaras
Abstract summary: This study introduces a novel streaming architecture based toolflow for mapping 3D Convolutional Neural Networks onto FPGAs. The HARFLOW3D toolflow takes as input a 3D CNN in ONNX format and a description of the FPGA characteristics. The ability of the toolflow to support a broad range of models and devices is shown through a number of experiments on various 3D CNN and FPGA system pairs.
Score: 71.45672882756001
License: http://creativecommons.org/licenses/by/4.0/
Abstract: For Human Action Recognition tasks (HAR), 3D Convolutional Neural Networks have proven to be highly effective, achieving state-of-the-art results. This study introduces a novel streaming architecture based toolflow for mapping such models onto FPGAs considering the model's inherent characteristics and the features of the targeted FPGA device. The HARFLOW3D toolflow takes as input a 3D CNN in ONNX format and a description of the FPGA characteristics, generating a design that minimizes the latency of the computation. The toolflow is comprised of a number of parts, including i) a 3D CNN parser, ii) a performance and resource model, iii) a scheduling algorithm for executing 3D models on the generated hardware, iv) a resource-aware optimization engine tailored for 3D models, v) an automated mapping to synthesizable code for FPGAs. The ability of the toolflow to support a broad range of models and devices is shown through a number of experiments on various 3D CNN and FPGA system pairs. Furthermore, the toolflow has produced high-performing results for 3D CNN models that have not been mapped to FPGAs before, demonstrating the potential of FPGA-based systems in this space. Overall, HARFLOW3D has demonstrated its ability to deliver competitive latency compared to a range of state-of-the-art hand-tuned approaches being able to achieve up to 5$\times$ better performance compared to some of the existing works.

Related papers

FastVGGT: Training-Free Acceleration of Visual Geometry Transformer [83.67766078575782]
VGGT is a state-of-the-art feed-forward visual geometry model.<n>We propose FastVGGT, which leverages token merging in the 3D domain through a training-free mechanism for accelerating VGGT.<n>With 1000 input images, FastVGGT achieves a 4x speedup over VGGT while mitigating error accumulation in long-sequence scenarios.
arXiv Detail & Related papers (2025-09-02T17:54:21Z)
3DGabSplat: 3D Gabor Splatting for Frequency-adaptive Radiance Field Rendering [50.04967868036964]
3D Gaussian Splatting (3DGS) has enabled real-time rendering while maintaining high-fidelity novel view synthesis.<n>We propose 3D Gabor Splatting (3DGabSplat) that incorporates a novel 3D Gabor-based primitive with multiple directional 3D frequency responses.<n>We achieve 1.35 dBR gain over 3D with simultaneously reduced number of primitive memory consumption.
arXiv Detail & Related papers (2025-08-07T12:49:44Z)
EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [61.1662426227688]
Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization. We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner.
arXiv Detail & Related papers (2025-03-26T02:47:27Z)
Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation [61.040832373015014]
We propose Flex3D, a novel framework for generating high-quality 3D content from text, single images, or sparse view images. We employ a fine-tuned multi-view image diffusion model and a video diffusion model to generate a pool of candidate views, enabling a rich representation of the target 3D object. In the second stage, the curated views are fed into a Flexible Reconstruction Model (FlexRM), built upon a transformer architecture that can effectively process an arbitrary number of inputs.
arXiv Detail & Related papers (2024-10-01T17:29:43Z)
fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence [50.417261057533786]
fVDB is a novel framework for deep learning on large-scale 3D data. Our framework is fully integrated with PyTorch enabling interoperability with existing pipelines.
arXiv Detail & Related papers (2024-07-01T20:20:33Z)
LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation [73.36690511083894]
This paper introduces a novel framework called LN3Diff to address a unified 3D diffusion pipeline. Our approach harnesses a 3D-aware architecture and variational autoencoder to encode the input image into a structured, compact, and 3D latent space. It achieves state-of-the-art performance on ShapeNet for 3D generation and demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation.
arXiv Detail & Related papers (2024-03-18T17:54:34Z)
Ultrafast jet classification on FPGAs for the HL-LHC [33.87493147633063]
Three machine learning models are used to perform jet origin classification. These models are optimized for deployment on a field-programmable gate array device.
arXiv Detail & Related papers (2024-02-02T20:02:12Z)
SATAY: A Streaming Architecture Toolflow for Accelerating YOLO Models on FPGA Devices [48.47320494918925]
This work tackles the challenges of deploying stateof-the-art object detection models onto FPGA devices for ultralow latency applications. We employ a streaming architecture design for our YOLO accelerators, implementing the complete model on-chip in a deeply pipelined fashion. We introduce novel hardware components to support the operations of YOLO models in a dataflow manner, and off-chip memory buffering to address the limited on-chip memory resources.
arXiv Detail & Related papers (2023-09-04T13:15:01Z)
fpgaHART: A toolflow for throughput-oriented acceleration of 3D CNNs for HAR onto FPGAs [10.385864925381384]
This study proposes a toolflow that optimises the mapping of 3D CNN models for Human Action Recognition onto FPGA devices. The proposed system employs Synchronous Dataflow (SDF) graphs to model the designs and introduces transformations to expand and explore the design space. A variety of 3D CNN models were evaluated using the proposed toolflow on multiple FPGA devices, demonstrating its potential to deliver competitive performance.
arXiv Detail & Related papers (2023-05-31T14:30:17Z)
FMM-X3D: FPGA-based modeling and mapping of X3D for Human Action Recognition [10.385864925381384]
This paper addresses the problem of mapping X3D, a state-of-the-art model in Human Action Recognition, onto any FPGA device. The proposed toolflow generates an optimised stream-based hardware system, taking into account the available resources and off-chip memory characteristics of the FPGA device.
arXiv Detail & Related papers (2023-05-29T11:17:51Z)
NeRF-GAN Distillation for Efficient 3D-Aware Generation with Convolutions [97.27105725738016]
integration of Neural Radiance Fields (NeRFs) and generative models, such as Generative Adversarial Networks (GANs) has transformed 3D-aware generation from single-view images. We propose a simple and effective method, based on re-using the well-disentangled latent space of a pre-trained NeRF-GAN in a pose-conditioned convolutional network to directly generate 3D-consistent images corresponding to the underlying 3D representations.
arXiv Detail & Related papers (2023-03-22T18:59:48Z)
Optimization of FPGA-based CNN Accelerators Using Metaheuristics [1.854931308524932]
convolutional neural networks (CNNs) have demonstrated their ability to solve problems in many fields. FPGAs have seen a surge in interest for accelerating CNN inference. Current trend in FPGA-based CNN accelerators is to implement multiple convolutional layer processors (CLPs)
arXiv Detail & Related papers (2022-09-22T18:57:49Z)
SECDA: Efficient Hardware/Software Co-Design of FPGA-based DNN Accelerators for Edge Inference [0.0]
We propose SECDA, a new hardware/software co-design methodology to reduce design time of optimized Deep Neural Networks (DNN) inference accelerators on edge devices with FPGAs. We use SECDA to efficiently develop two different DNN accelerator designs on a PYNQ-Z1 board, a platform that includes an edge FPGA. We evaluate the two accelerator designs with four common DNN models, achieving an average performance speedup across models of up to 3.5$times$ with a 2.9$times$ reduction in energy consumption over CPU-only inference.
arXiv Detail & Related papers (2021-10-01T15:20:29Z)
HALF: Holistic Auto Machine Learning for FPGAs [1.9146960682777232]
Deep Neural Networks (DNNs) are capable of solving complex problems in domains related to embedded systems, such as image and natural language processing. To efficiently implement DNNs on a specific FPGA platform for a given cost criterion, e.g. energy efficiency, an enormous amount of design parameters has to be considered. An automatic, holistic design approach can improve the quality of DNN implementations on FPGA significantly.
arXiv Detail & Related papers (2021-06-28T14:45:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.