RoseNNa: A performant, portable library for neural network inference
with application to computational fluid dynamics
- URL: http://arxiv.org/abs/2307.16322v1
- Date: Sun, 30 Jul 2023 21:11:55 GMT
- Title: RoseNNa: A performant, portable library for neural network inference
with application to computational fluid dynamics
- Authors: Ajay Bati, Spencer H. Bryngelson
- Abstract summary: We present the roseNNa library, which bridges the gap between neural network inference and CFD.
RoseNNa is a non-invasive, lightweight (1000 lines) tool for neural network inference.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rise of neural network-based machine learning ushered in high-level
libraries, including TensorFlow and PyTorch, to support their functionality.
Computational fluid dynamics (CFD) researchers have benefited from this trend
and produced powerful neural networks that promise shorter simulation times.
For example, multilayer perceptrons (MLPs) and Long Short Term Memory (LSTM)
recurrent-based (RNN) architectures can represent sub-grid physical effects,
like turbulence. Implementing neural networks in CFD solvers is challenging
because the programming languages used for machine learning and CFD are mostly
non-overlapping, We present the roseNNa library, which bridges the gap between
neural network inference and CFD. RoseNNa is a non-invasive, lightweight (1000
lines), and performant tool for neural network inference, with focus on the
smaller networks used to augment PDE solvers, like those of CFD, which are
typically written in C/C++ or Fortran. RoseNNa accomplishes this by
automatically converting trained models from typical neural network training
packages into a high-performance Fortran library with C and Fortran APIs. This
reduces the effort needed to access trained neural networks and maintains
performance in the PDE solvers that CFD researchers build and rely upon.
Results show that RoseNNa reliably outperforms PyTorch (Python) and libtorch
(C++) on MLPs and LSTM RNNs with less than 100 hidden layers and 100 neurons
per layer, even after removing the overhead cost of API calls. Speedups range
from a factor of about 10 and 2 faster than these established libraries for the
smaller and larger ends of the neural network size ranges tested.
Related papers
- Accelerating SNN Training with Stochastic Parallelizable Spiking Neurons [1.7056768055368383]
Spiking neural networks (SNN) are able to learn features while using less energy, especially on neuromorphic hardware.
Most widely used neuron in deep learning is the temporal and Fire (LIF) neuron.
arXiv Detail & Related papers (2023-06-22T04:25:27Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Training Spiking Neural Networks with Local Tandem Learning [96.32026780517097]
Spiking neural networks (SNNs) are shown to be more biologically plausible and energy efficient than their predecessors.
In this paper, we put forward a generalized learning rule, termed Local Tandem Learning (LTL)
We demonstrate rapid network convergence within five training epochs on the CIFAR-10 dataset while having low computational complexity.
arXiv Detail & Related papers (2022-10-10T10:05:00Z) - Variable Bitrate Neural Fields [75.24672452527795]
We present a dictionary method for compressing feature grids, reducing their memory consumption by up to 100x.
We formulate the dictionary optimization as a vector-quantized auto-decoder problem which lets us learn end-to-end discrete neural representations in a space where no direct supervision is available.
arXiv Detail & Related papers (2022-06-15T17:58:34Z) - CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded
Systems [0.0]
A Convolutional Neural Network (CNN) is a class of Deep Neural Network (DNN) widely used in the analysis of visual images captured by an image sensor.
In this paper, we propose a neoteric variant of deep convolutional neural network architecture to ameliorate the performance of existing CNN architectures for real-time inference on embedded systems.
arXiv Detail & Related papers (2021-12-01T18:20:52Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - ItNet: iterative neural networks with small graphs for accurate and
efficient anytime prediction [1.52292571922932]
In this study, we introduce a class of network models that have a small memory footprint in terms of their computational graphs.
We show state-of-the-art results for semantic segmentation on the CamVid and Cityscapes datasets.
arXiv Detail & Related papers (2021-01-21T15:56:29Z) - Tensor train decompositions on recurrent networks [60.334946204107446]
Matrix product state (MPS) tensor trains have more attractive features than MPOs, in terms of storage reduction and computing time at inference.
We show that MPS tensor trains should be at the forefront of LSTM network compression through a theoretical analysis and practical experiments on NLP task.
arXiv Detail & Related papers (2020-06-09T18:25:39Z) - A Fortran-Keras Deep Learning Bridge for Scientific Computing [6.768544973019004]
We introduce a software library, the Fortran-Keras Bridge (FKB)
The paper describes several unique features offered by FKB, such as customizable layers, loss functions, and network ensembles.
The paper concludes with a case study that applies FKB to address open questions about the robustness of an experimental approach to global climate simulation.
arXiv Detail & Related papers (2020-04-14T15:10:09Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.