Symbolic Regression on FPGAs for Fast Machine Learning Inference
- URL: http://arxiv.org/abs/2305.04099v2
- Date: Wed, 17 Jan 2024 23:18:50 GMT
- Title: Symbolic Regression on FPGAs for Fast Machine Learning Inference
- Authors: Ho Fung Tsoi, Adrian Alan Pol, Vladimir Loncar, Ekaterina Govorkova,
Miles Cranmer, Sridhara Dasu, Peter Elmer, Philip Harris, Isobel Ojalvo,
Maurizio Pierini
- Abstract summary: High-energy physics community is investigating the potential of deploying machine-learning-based solutions on Field-Programmable Gate Arrays (FPGAs)
We introduce a novel end-to-end procedure that utilizes a machine learning technique called symbolic regression (SR)
We show that our approach can approximate a 3-layer neural network using an inference model that achieves up to a 13-fold decrease in execution time, down to 5 ns, while still preserving more than 90% approximation accuracy.
- Score: 2.0920303420933273
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The high-energy physics community is investigating the potential of deploying
machine-learning-based solutions on Field-Programmable Gate Arrays (FPGAs) to
enhance physics sensitivity while still meeting data processing time
constraints. In this contribution, we introduce a novel end-to-end procedure
that utilizes a machine learning technique called symbolic regression (SR). It
searches the equation space to discover algebraic relations approximating a
dataset. We use PySR (a software to uncover these expressions based on an
evolutionary algorithm) and extend the functionality of hls4ml (a package for
machine learning inference in FPGAs) to support PySR-generated expressions for
resource-constrained production environments. Deep learning models often
optimize the top metric by pinning the network size because the vast
hyperparameter space prevents an extensive search for neural architecture.
Conversely, SR selects a set of models on the Pareto front, which allows for
optimizing the performance-resource trade-off directly. By embedding symbolic
forms, our implementation can dramatically reduce the computational resources
needed to perform critical tasks. We validate our method on a physics
benchmark: the multiclass classification of jets produced in simulated
proton-proton collisions at the CERN Large Hadron Collider. We show that our
approach can approximate a 3-layer neural network using an inference model that
achieves up to a 13-fold decrease in execution time, down to 5 ns, while still
preserving more than 90% approximation accuracy.
Related papers
- Compressing Recurrent Neural Networks for FPGA-accelerated Implementation in Fluorescence Lifetime Imaging [3.502427552446068]
Deep learning models enable real-time inference, but can be computationally demanding due to complex architectures and large matrix operations.
This makes DL models ill-suited for direct implementation on field-programmable gate array (FPGA)-based camera hardware.
In this work, we focus on compressing recurrent neural networks (RNNs), which are well-suited for FLI time-series data processing, to enable deployment on resource-constrained FPGA boards.
arXiv Detail & Related papers (2024-10-01T17:23:26Z) - rule4ml: An Open-Source Tool for Resource Utilization and Latency Estimation for ML Models on FPGA [0.0]
This paper introduces a novel method to predict the resource utilization and inference latency of Neural Networks (NNs) before their synthesis and implementation on FPGA.
We leverage HLS4ML, a tool-flow that helps translate NNs into high-level synthesis (HLS) code.
Our method uses trained regression models for immediate pre-synthesis predictions.
arXiv Detail & Related papers (2024-08-09T19:35:10Z) - Scaling Studies for Efficient Parameter Search and Parallelism for Large
Language Model Pre-training [2.875838666718042]
We focus on parallel and distributed machine learning algorithm development, specifically for optimizing the data processing and pre-training of a set of 5 encoder-decoder LLMs.
We performed a fine-grained study to quantify the relationships between three ML methods, specifically exploring Microsoft DeepSpeed Zero Redundancy stages.
arXiv Detail & Related papers (2023-10-09T02:22:00Z) - Geometry-Informed Neural Operator for Large-Scale 3D PDEs [76.06115572844882]
We propose the geometry-informed neural operator (GINO) to learn the solution operator of large-scale partial differential equations.
We successfully trained GINO to predict the pressure on car surfaces using only five hundred data points.
arXiv Detail & Related papers (2023-09-01T16:59:21Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - LCS: Learning Compressible Subspaces for Adaptive Network Compression at
Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models.
We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity.
Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z) - Local approximate Gaussian process regression for data-driven
constitutive laws: Development and comparison with neural networks [0.0]
We show how to use local approximate process regression to predict stress outputs at particular strain space locations.
A modified Newton-Raphson approach is proposed to accommodate for the local nature of the laGPR approximation when solving the global structural problem in a FE setting.
arXiv Detail & Related papers (2021-05-07T14:49:28Z) - PAC-learning gains of Turing machines over circuits and neural networks [1.4502611532302039]
We study the potential gains in sample efficiency that can bring in the principle of minimum description length.
We use Turing machines to represent universal models and circuits.
We highlight close relationships between classical open problems in Circuit Complexity and the tightness of these.
arXiv Detail & Related papers (2021-03-23T17:03:10Z) - Optimizing Memory Placement using Evolutionary Graph Reinforcement
Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces.
We train and validate our approach directly on the Intel NNP-I chip for inference.
We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z) - Predictive Coding Approximates Backprop along Arbitrary Computation
Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents.
Our models perform equivalently to backprop on challenging machine learning benchmarks.
Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z) - One-step regression and classification with crosspoint resistive memory
arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge.
One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition.
Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.