An FPGA-Based On-Device Reinforcement Learning Approach using Online
Sequential Learning
- URL: http://arxiv.org/abs/2005.04646v3
- Date: Tue, 23 Mar 2021 07:09:38 GMT
- Title: An FPGA-Based On-Device Reinforcement Learning Approach using Online
Sequential Learning
- Authors: Hirohisa Watanabe, Mineto Tsukada and Hiroki Matsutani
- Abstract summary: We propose a lightweight on-device reinforcement learning approach for low-cost FPGA devices.
It exploits a recently proposed neural-network based on-device learning approach that does not rely on the backpropagation method.
The proposed reinforcement learning approach is designed for PYNQ-Z1 board as a low-cost FPGA platform.
- Score: 2.99321624683618
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: DQN (Deep Q-Network) is a method to perform Q-learning for reinforcement
learning using deep neural networks. DQNs require a large buffer and batch
processing for an experience replay and rely on a backpropagation based
iterative optimization, making them difficult to be implemented on
resource-limited edge devices. In this paper, we propose a lightweight
on-device reinforcement learning approach for low-cost FPGA devices. It
exploits a recently proposed neural-network based on-device learning approach
that does not rely on the backpropagation method but uses OS-ELM (Online
Sequential Extreme Learning Machine) based training algorithm. In addition, we
propose a combination of L2 regularization and spectral normalization for the
on-device reinforcement learning so that output values of the neural network
can be fit into a certain range and the reinforcement learning becomes stable.
The proposed reinforcement learning approach is designed for PYNQ-Z1 board as a
low-cost FPGA platform. The evaluation results using OpenAI Gym demonstrate
that the proposed algorithm and its FPGA implementation complete a CartPole-v0
task 29.77x and 89.40x faster than a conventional DQN-based approach when the
number of hidden-layer nodes is 64.
Related papers
- Simplifying Deep Temporal Difference Learning [3.458933902627673]
We investigate whether it is possible to accelerate and simplify TD training while maintaining its stability.
Our key theoretical result demonstrates for the first time that regularisation techniques such as LayerNorm can yield provably convergent TD algorithms.
Motivated by these findings, we propose PQN, our simplified deep online Q-Learning algorithm.
arXiv Detail & Related papers (2024-07-05T18:49:07Z) - Exploiting FPGA Capabilities for Accelerated Biomedical Computing [0.0]
This study presents advanced neural network architectures for enhanced ECG signal analysis using Field Programmable Gate Arrays (FPGAs)
We utilize the MIT-BIH Arrhythmia Database for training and validation, introducing Gaussian noise to improve robustness.
The study ultimately offers a guide for optimizing neural network performance on FPGAs for various applications.
arXiv Detail & Related papers (2023-07-16T01:20:17Z) - Reconfigurable Distributed FPGA Cluster Design for Deep Learning
Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications.
The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems [62.20308752994373]
We propose a new smart network interface card (NIC) for distributed AI training systems using field-programmable gate arrays (FPGAs)
Our proposed FPGA-based AI smart NIC enhances overall training performance by 1.6x at 6 nodes, with an estimated 2.5x performance improvement at 32 nodes, compared to the baseline system using conventional NICs.
arXiv Detail & Related papers (2022-04-22T21:57:00Z) - A quantum algorithm for training wide and deep classical neural networks [72.2614468437919]
We show that conditions amenable to classical trainability via gradient descent coincide with those necessary for efficiently solving quantum linear systems.
We numerically demonstrate that the MNIST image dataset satisfies such conditions.
We provide empirical evidence for $O(log n)$ training of a convolutional neural network with pooling.
arXiv Detail & Related papers (2021-07-19T23:41:03Z) - Generalized Learning Vector Quantization for Classification in
Randomized Neural Networks and Hyperdimensional Computing [4.4886210896619945]
We propose a modified RVFL network that avoids computationally expensive matrix operations during training.
The proposed approach achieved state-of-the-art accuracy on a collection of datasets from the UCI Machine Learning Repository.
arXiv Detail & Related papers (2021-06-17T21:17:17Z) - Exploration of Hardware Acceleration Methods for an XNOR Traffic Signs
Classifier [0.0]
In this work, we explore the possibility of accelerating XNOR networks for traffic sign classification.
We propose a custom HDL accelerator for XNOR networks, which enables the inference with almost 450 fps.
Even better results are obtained with the second method - the Xilinx FINN accelerator - enabling to process input images with around 550 frame rate.
arXiv Detail & Related papers (2021-04-06T06:01:57Z) - A Meta-Learning Approach to the Optimal Power Flow Problem Under
Topology Reconfigurations [69.73803123972297]
We propose a DNN-based OPF predictor that is trained using a meta-learning (MTL) approach.
The developed OPF-predictor is validated through simulations using benchmark IEEE bus systems.
arXiv Detail & Related papers (2020-12-21T17:39:51Z) - An FPGA Accelerated Method for Training Feed-forward Neural Networks
Using Alternating Direction Method of Multipliers and LSMR [2.8747398859585376]
We have successfully designed, implemented, deployed and tested a novel FPGA accelerated algorithm for neural network training.
The training method is based on Alternating Direction Method of Multipliers algorithm, which has strong parallel characteristics.
We devised an FPGA accelerated version of the algorithm using Intel FPGA SDK for OpenCL and performed extensive stages followed by successful deployment of the program on an Intel Arria 10 GX FPGA.
arXiv Detail & Related papers (2020-09-06T17:33:03Z) - ALF: Autoencoder-based Low-rank Filter-sharing for Efficient
Convolutional Neural Networks [63.91384986073851]
We propose the autoencoder-based low-rank filter-sharing technique technique (ALF)
ALF shows a reduction of 70% in network parameters, 61% in operations and 41% in execution time, with minimal loss in accuracy.
arXiv Detail & Related papers (2020-07-27T09:01:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.