Related papers: Accelerating ODE-Based Neural Networks on Low-Cost FPGAs

Accelerating ODE-Based Neural Networks on Low-Cost FPGAs

URL: http://arxiv.org/abs/2012.15465v3
Date: Mon, 29 Mar 2021 16:42:41 GMT
Title: Accelerating ODE-Based Neural Networks on Low-Cost FPGAs
Authors: Hirohisa Watanabe, Hiroki Matsutani
Abstract summary: ODENet is a deep neural network architecture in which a stacking structure of ResNet is implemented with an ordinary differential equation solver. It can reduce the number of parameters and strike a balance between accuracy and performance by selecting a proper solver. It is also possible to improve the accuracy while keeping the same number of parameters on resource-limited edge devices.
Score: 3.4795226670772745
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: ODENet is a deep neural network architecture in which a stacking structure of ResNet is implemented with an ordinary differential equation (ODE) solver. It can reduce the number of parameters and strike a balance between accuracy and performance by selecting a proper solver. It is also possible to improve the accuracy while keeping the same number of parameters on resource-limited edge devices. In this paper, using Euler method as an ODE solver, a part of ODENet is implemented as a dedicated logic on a low-cost FPGA (Field-Programmable Gate Array) board, such as PYNQ-Z2 board. As ODENet variants, reduced ODENets (rODENets) each of which heavily uses a part of ODENet layers and reduces/eliminates some layers differently are proposed and analyzed for low-cost FPGA implementation. They are evaluated in terms of parameter size, accuracy, execution time, and resource utilization on the FPGA. The results show that an overall execution time of an rODENet variant is improved by up to 2.66 times compared to a pure software execution while keeping a comparable accuracy to the original ODENet.

Related papers

Latency-aware Unified Dynamic Networks for Efficient Image Recognition [72.8951331472913]
LAUDNet is a framework to bridge the theoretical and practical efficiency gap in dynamic networks. It integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping. It can notably reduce the latency of models like ResNet by over 50% on platforms such as V100,3090, and TX2 GPUs.
arXiv Detail & Related papers (2023-08-30T10:57:41Z)
Neural Generalized Ordinary Differential Equations with Layer-varying Parameters [1.3691539554014036]
We show that the layer-varying Neural-GODE is more flexible and general than the standard Neural-ODE. The Neural-GODE enjoys the computational and memory benefits while performing comparably to ResNets in prediction accuracy.
arXiv Detail & Related papers (2022-09-21T20:02:28Z)
Neural Basis Functions for Accelerating Solutions to High Mach Euler Equations [63.8376359764052]
We propose an approach to solving partial differential equations (PDEs) using a set of neural networks. We regress a set of neural networks onto a reduced order Proper Orthogonal Decomposition (POD) basis. These networks are then used in combination with a branch network that ingests the parameters of the prescribed PDE to compute a reduced order approximation to the PDE.
arXiv Detail & Related papers (2022-08-02T18:27:13Z)
Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations. We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z)
A Low-Cost Neural ODE with Depthwise Separable Convolution for Edge Domain Adaptation on FPGAs [2.620638110026557]
ResNet is one of conventional deep neural network models that stack a lot of layers and parameters for a higher accuracy. In this paper, a combination of Neural ODE and DSC, called dsODENet, is designed and implemented for FPGAs. The results demonstrate that dsODENet is comparable to or slightly better than our baseline Neural ODE implementation in terms of domain adaptation accuracy.
arXiv Detail & Related papers (2021-07-27T13:44:13Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
NullaNet Tiny: Ultra-low-latency DNN Inference Through Fixed-function Combinational Logic [4.119948826527649]
Field-programmable gate array (FPGA)-based accelerators are gaining traction as a serious contender to replace graphics processing unit/central processing unit-based platforms. This paper presents NullaNet Tiny, a framework for constructing resource and energy-efficient, ultra-low-latency FPGA-based neural network accelerators.
arXiv Detail & Related papers (2021-04-07T00:16:39Z)
dNNsolve: an efficient NN-based PDE solver [62.997667081978825]
We introduce dNNsolve, that makes use of dual Neural Networks to solve ODEs/PDEs. We show that dNNsolve is capable of solving a broad range of ODEs/PDEs in 1, 2 and 3 spacetime dimensions.
arXiv Detail & Related papers (2021-03-15T19:14:41Z)
Accuracy and Architecture Studies of Residual Neural Network solving Ordinary Differential Equations [0.0]
We consider utilizing a residual neural network (ResNet) to solve ordinary differential equations. We apply forward Euler, Runge-Kutta2 and Runge-Kutta4 finite difference methods to generate three sets of targets training the ResNet. The well trained ResNet behaves just as its counterpart of the corresponding one-step finite difference method.
arXiv Detail & Related papers (2021-01-10T17:34:10Z)
AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation. Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z)
Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML [13.325670094073383]
We present the implementation of binary and ternary neural networks in the hls4ml library. We discuss the trade-off between model accuracy and resource consumption. The binary and ternary implementation has similar performance to the higher precision implementation while using drastically fewer FPGA resources.
arXiv Detail & Related papers (2020-03-11T10:46:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.