Performance and Energy Consumption of Parallel Machine Learning
Algorithms
- URL: http://arxiv.org/abs/2305.00798v1
- Date: Mon, 1 May 2023 13:04:39 GMT
- Title: Performance and Energy Consumption of Parallel Machine Learning
Algorithms
- Authors: Xidong Wu, Preston Brazzle, Stephen Cahoon
- Abstract summary: Machine learning models have achieved remarkable success in various real-world applications.
Model training in machine learning requires large-scale data sets and multiple iterations before it can work properly.
Parallelization of training algorithms is a common strategy to speed up the process of training.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning models have achieved remarkable success in various
real-world applications such as data science, computer vision, and natural
language processing. However, model training in machine learning requires
large-scale data sets and multiple iterations before it can work properly.
Parallelization of training algorithms is a common strategy to speed up the
process of training. However, many studies on model training and inference
focus only on aspects of performance. Power consumption is also an important
metric for any type of computation, especially high-performance applications.
Machine learning algorithms that can be used on low-power platforms such as
sensors and mobile devices have been researched, but less power optimization is
done for algorithms designed for high-performance computing.
In this paper, we present a C++ implementation of logistic regression and the
genetic algorithm, and a Python implementation of neural networks with
stochastic gradient descent (SGD) algorithm on classification tasks. We will
show the impact that the complexity of the model and the size of the training
data have on the parallel efficiency of the algorithm in terms of both power
and performance. We also tested these implementations using shard-memory
parallelism, distributed memory parallelism, and GPU acceleration to speed up
machine learning model training.
Related papers
- Towards provably efficient quantum algorithms for large-scale
machine-learning models [11.440134080370811]
We show that fault-tolerant quantum computing could possibly provide provably efficient resolutions for generic (stochastic) gradient descent algorithms.
We benchmark instances of large machine learning models from 7 million to 103 million parameters.
arXiv Detail & Related papers (2023-03-06T19:00:27Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Benchmarking Learning Efficiency in Deep Reservoir Computing [23.753943709362794]
We introduce a benchmark of increasingly difficult tasks together with a data efficiency metric to measure how quickly machine learning models learn from training data.
We compare the learning speed of some established sequential supervised models, such as RNNs, LSTMs, or Transformers, with relatively less known alternative models based on reservoir computing.
arXiv Detail & Related papers (2022-09-29T08:16:52Z) - Machine Learning Training on a Real Processing-in-Memory System [9.286176889576996]
Training machine learning algorithms is a computationally intensive process, which is frequently memory-bound.
Memory-centric computing systems with processing-in-memory capabilities can alleviate this data movement bottleneck.
Our work is the first one to evaluate training of machine learning algorithms on a real-world general-purpose PIM architecture.
arXiv Detail & Related papers (2022-06-13T10:20:23Z) - Benchmarking Processor Performance by Multi-Threaded Machine Learning
Algorithms [0.0]
In this paper, I will make a performance comparison of multi-threaded machine learning clustering algorithms.
I will be working on Linear Regression, Random Forest, and K-Nearest Neighbors to determine the performance characteristics of the algorithms.
arXiv Detail & Related papers (2021-09-11T13:26:58Z) - Towards Efficient and Scalable Acceleration of Online Decision Tree
Learning on FPGA [20.487660974785943]
In the era of big data, traditional decision tree induction algorithms are not suitable for learning large-scale datasets.
We introduce a new quantile-based algorithm to improve the induction of the Hoeffding tree, one of the state-of-the-art online learning models.
We present a high-performance, hardware-efficient and scalable online decision tree learning system on a field-programmable gate array.
arXiv Detail & Related papers (2020-09-03T03:23:43Z) - Optimizing Memory Placement using Evolutionary Graph Reinforcement
Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces.
We train and validate our approach directly on the Intel NNP-I chip for inference.
We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z) - One-step regression and classification with crosspoint resistive memory
arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge.
One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition.
Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z) - Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic
Circuits [99.59941892183454]
We propose Einsum Networks (EiNets), a novel implementation design for PCs.
At their core, EiNets combine a large number of arithmetic operations in a single monolithic einsum-operation.
We show that the implementation of Expectation-Maximization (EM) can be simplified for PCs, by leveraging automatic differentiation.
arXiv Detail & Related papers (2020-04-13T23:09:15Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z) - On Coresets for Support Vector Machines [61.928187390362176]
A coreset is a small, representative subset of the original data points.
We show that our algorithm can be used to extend the applicability of any off-the-shelf SVM solver to streaming, distributed, and dynamic data settings.
arXiv Detail & Related papers (2020-02-15T23:25:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.