Related papers: StreamBrain: An HPC Framework for Brain-like Neural Networks on CPUs, GPUs and FPGAs

StreamBrain: An HPC Framework for Brain-like Neural Networks on CPUs, GPUs and FPGAs

URL: http://arxiv.org/abs/2106.05373v1
Date: Wed, 9 Jun 2021 20:28:18 GMT
Title: StreamBrain: An HPC Framework for Brain-like Neural Networks on CPUs, GPUs and FPGAs
Authors: Artur Podobas, Martin Svedin, Steven W. D. Chien, Ivy B. Peng, Naresh Balaji Ravichandran, Pawel Herman, Anders Lansner, Stefano Markidis
Abstract summary: StreamBrain is a framework that allows neural networks based on BCPNN to be practically deployed in High-Performance Computing systems. We empirically demonstrate that StreamBrain can train the well-known ML benchmark dataset MNIST within seconds. We are the first to demonstrate BCPNN on STL-10 size networks.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The modern deep learning method based on backpropagation has surged in popularity and has been used in multiple domains and application areas. At the same time, there are other -- less-known -- machine learning algorithms with a mature and solid theoretical foundation whose performance remains unexplored. One such example is the brain-like Bayesian Confidence Propagation Neural Network (BCPNN). In this paper, we introduce StreamBrain -- a framework that allows neural networks based on BCPNN to be practically deployed in High-Performance Computing systems. StreamBrain is a domain-specific language (DSL), similar in concept to existing machine learning (ML) frameworks, and supports backends for CPUs, GPUs, and even FPGAs. We empirically demonstrate that StreamBrain can train the well-known ML benchmark dataset MNIST within seconds, and we are the first to demonstrate BCPNN on STL-10 size networks. We also show how StreamBrain can be used to train with custom floating-point formats and illustrate the impact of using different bfloat variations on BCPNN using FPGAs.

Related papers

Embedded FPGA Acceleration of Brain-Like Neural Networks: Online Learning to Scalable Inference [0.0]
We present the first embedded FPGA accelerator for BCPNN on a Zynq UltraScale+ system using High-Level Synthesis.<n>Our accelerator achieves up to 17.5x latency and 94% energy savings over ARM baselines, without sacrificing accuracy.<n>This work enables practical neuromorphic computing on edge devices, bridging the gap between brain-like learning and real-world deployment.
arXiv Detail & Related papers (2025-06-23T11:35:20Z)
An Overview of Arithmetic Adaptations for Inference of Convolutional Neural Networks on Re-configurable Hardware [0.0]
Convolutional Neural Networks (CNNs) have gained high popularity as a tool for computer vision tasks.<n>CNNs suffer from disadvantages regarding the deployment on embedded platforms like Field Programmable Gate Arrays (FPGAs)
arXiv Detail & Related papers (2025-05-19T14:08:28Z)
A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks [0.0]
Brain-inspired algorithms are attractive and emerging alternatives to classical deep learning methods. BCPNN is an important tool for both machine learning and computational neuroscience research. BCPNN can reach state-of-the-art performance in tasks such as learning and memory recall compared to other models. We design a custom stream-based accelerator for BCPNN using Field-Programmable Gate Arrays (FPGA) using Xilinx Vitis High-Level Synthesis (HLS) flow.
arXiv Detail & Related papers (2025-03-03T14:06:43Z)
NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals [58.83169560132308]
We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks.
arXiv Detail & Related papers (2024-07-18T17:59:01Z)
LookupFFN: Making Transformers Compute-lite for CPU inference [23.61144705380663]
GPU clusters are the de facto choice for training large deep neural network (DNN) models today. Several reasons including ease of workflow, security and cost have led to efforts investigating whether CPUs may be viable for inference in routine use in many sectors of the industry. We study a module which is a workhorse within modern architectures, GEMM based Feed Forward Networks (FFNs) and assess the extent to which it can be made compute- (or FLOP-) lite.
arXiv Detail & Related papers (2024-03-12T00:26:16Z)
Basic Binary Convolution Unit for Binarized Image Restoration Network [146.0988597062618]
In this study, we reconsider components in binary convolution, such as residual connection, BatchNorm, activation function, and structure, for image restoration tasks. Based on our findings and analyses, we design a simple yet efficient basic binary convolution unit (BBCU) Our BBCU significantly outperforms other BNNs and lightweight models, which shows that BBCU can serve as a basic unit for binarized IR networks.
arXiv Detail & Related papers (2022-10-02T01:54:40Z)
CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded Systems [0.0]
A Convolutional Neural Network (CNN) is a class of Deep Neural Network (DNN) widely used in the analysis of visual images captured by an image sensor. In this paper, we propose a neoteric variant of deep convolutional neural network architecture to ameliorate the performance of existing CNN architectures for real-time inference on embedded systems.
arXiv Detail & Related papers (2021-12-01T18:20:52Z)
Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs. SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space. Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
Learning on Hardware: A Tutorial on Neural Network Accelerators and Co-Processors [0.0]
Deep neural networks (DNNs) have the advantage that they can take into account a large number of parameters, which enables them to solve complex tasks. In computer vision and speech recognition, they have a better accuracy than common algorithms, and in some tasks, they boast an even higher accuracy than human experts. With the progress of DNNs in recent years, many other fields of application such as diagnosis of diseases and autonomous driving are taking advantage of them.
arXiv Detail & Related papers (2021-04-19T12:50:27Z)
Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data. In this paper, we present and evaluate different strategies for the binarization of graph neural networks. We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z)
Overview of FPGA deep learning acceleration based on convolutional neural network [0.76146285961466]
In recent years, deep learning has become more and more mature, and as a commonly used algorithm in deep learning, convolutional neural networks have been widely used in various visual tasks. This article is a review article, which mainly introduces the related theories and algorithms of convolution. It summarizes the application scenarios of several existing FPGA technologies based on convolutional neural networks, and mainly introduces the application of accelerators.
arXiv Detail & Related papers (2020-12-23T12:44:24Z)
Compiling ONNX Neural Network Models Using MLIR [51.903932262028235]
We present a preliminary report on our onnx-mlir compiler, which generates code for the inference of deep neural network models. Onnx-mlir relies on the Multi-Level Intermediate Representation (MLIR) infrastructure recently integrated in the LLVM project.
arXiv Detail & Related papers (2020-08-19T05:28:08Z)
Exposing Hardware Building Blocks to Machine Learning Frameworks [4.56877715768796]
We focus on how to design topologies that complement such a view of neurons as unique functions. We develop a library that supports training a neural network with custom sparsity and quantization.
arXiv Detail & Related papers (2020-04-10T14:26:00Z)
The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding [97.85957811603251]
We present MT-DNN, an open-source natural language understanding (NLU) toolkit that makes it easy for researchers and developers to train customized deep learning models. Built upon PyTorch and Transformers, MT-DNN is designed to facilitate rapid customization for a broad spectrum of NLU tasks. A unique feature of MT-DNN is its built-in support for robust and transferable learning using the adversarial multi-task learning paradigm.
arXiv Detail & Related papers (2020-02-19T03:05:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.