Accelerating ODE-Based Neural Networks on Low-Cost FPGAs
- URL: http://arxiv.org/abs/2012.15465v3
- Date: Mon, 29 Mar 2021 16:42:41 GMT
- Title: Accelerating ODE-Based Neural Networks on Low-Cost FPGAs
- Authors: Hirohisa Watanabe, Hiroki Matsutani
- Abstract summary: ODENet is a deep neural network architecture in which a stacking structure of ResNet is implemented with an ordinary differential equation solver.
It can reduce the number of parameters and strike a balance between accuracy and performance by selecting a proper solver.
It is also possible to improve the accuracy while keeping the same number of parameters on resource-limited edge devices.
- Score: 3.4795226670772745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: ODENet is a deep neural network architecture in which a stacking structure of
ResNet is implemented with an ordinary differential equation (ODE) solver. It
can reduce the number of parameters and strike a balance between accuracy and
performance by selecting a proper solver. It is also possible to improve the
accuracy while keeping the same number of parameters on resource-limited edge
devices. In this paper, using Euler method as an ODE solver, a part of ODENet
is implemented as a dedicated logic on a low-cost FPGA (Field-Programmable Gate
Array) board, such as PYNQ-Z2 board. As ODENet variants, reduced ODENets
(rODENets) each of which heavily uses a part of ODENet layers and
reduces/eliminates some layers differently are proposed and analyzed for
low-cost FPGA implementation. They are evaluated in terms of parameter size,
accuracy, execution time, and resource utilization on the FPGA. The results
show that an overall execution time of an rODENet variant is improved by up to
2.66 times compared to a pure software execution while keeping a comparable
accuracy to the original ODENet.
Related papers
- Latency-aware Unified Dynamic Networks for Efficient Image Recognition [72.8951331472913]
LAUDNet is a framework to bridge the theoretical and practical efficiency gap in dynamic networks.
It integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping.
It can notably reduce the latency of models like ResNet by over 50% on platforms such as V100,3090, and TX2 GPUs.
arXiv Detail & Related papers (2023-08-30T10:57:41Z) - Neural Generalized Ordinary Differential Equations with Layer-varying
Parameters [1.3691539554014036]
We show that the layer-varying Neural-GODE is more flexible and general than the standard Neural-ODE.
The Neural-GODE enjoys the computational and memory benefits while performing comparably to ResNets in prediction accuracy.
arXiv Detail & Related papers (2022-09-21T20:02:28Z) - Neural Basis Functions for Accelerating Solutions to High Mach Euler
Equations [63.8376359764052]
We propose an approach to solving partial differential equations (PDEs) using a set of neural networks.
We regress a set of neural networks onto a reduced order Proper Orthogonal Decomposition (POD) basis.
These networks are then used in combination with a branch network that ingests the parameters of the prescribed PDE to compute a reduced order approximation to the PDE.
arXiv Detail & Related papers (2022-08-02T18:27:13Z) - Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time
Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations.
We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z) - A Low-Cost Neural ODE with Depthwise Separable Convolution for Edge
Domain Adaptation on FPGAs [2.620638110026557]
ResNet is one of conventional deep neural network models that stack a lot of layers and parameters for a higher accuracy.
In this paper, a combination of Neural ODE and DSC, called dsODENet, is designed and implemented for FPGAs.
The results demonstrate that dsODENet is comparable to or slightly better than our baseline Neural ODE implementation in terms of domain adaptation accuracy.
arXiv Detail & Related papers (2021-07-27T13:44:13Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - NullaNet Tiny: Ultra-low-latency DNN Inference Through Fixed-function
Combinational Logic [4.119948826527649]
Field-programmable gate array (FPGA)-based accelerators are gaining traction as a serious contender to replace graphics processing unit/central processing unit-based platforms.
This paper presents NullaNet Tiny, a framework for constructing resource and energy-efficient, ultra-low-latency FPGA-based neural network accelerators.
arXiv Detail & Related papers (2021-04-07T00:16:39Z) - dNNsolve: an efficient NN-based PDE solver [62.997667081978825]
We introduce dNNsolve, that makes use of dual Neural Networks to solve ODEs/PDEs.
We show that dNNsolve is capable of solving a broad range of ODEs/PDEs in 1, 2 and 3 spacetime dimensions.
arXiv Detail & Related papers (2021-03-15T19:14:41Z) - Accuracy and Architecture Studies of Residual Neural Network solving
Ordinary Differential Equations [0.0]
We consider utilizing a residual neural network (ResNet) to solve ordinary differential equations.
We apply forward Euler, Runge-Kutta2 and Runge-Kutta4 finite difference methods to generate three sets of targets training the ResNet.
The well trained ResNet behaves just as its counterpart of the corresponding one-step finite difference method.
arXiv Detail & Related papers (2021-01-10T17:34:10Z) - AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation.
Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z) - Compressing deep neural networks on FPGAs to binary and ternary
precision with HLS4ML [13.325670094073383]
We present the implementation of binary and ternary neural networks in the hls4ml library.
We discuss the trade-off between model accuracy and resource consumption.
The binary and ternary implementation has similar performance to the higher precision implementation while using drastically fewer FPGA resources.
arXiv Detail & Related papers (2020-03-11T10:46:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.