Related papers: Performance and Energy Consumption of Parallel Machine Learning Algorithms

Performance and Energy Consumption of Parallel Machine Learning Algorithms

URL: http://arxiv.org/abs/2305.00798v1
Date: Mon, 1 May 2023 13:04:39 GMT
Title: Performance and Energy Consumption of Parallel Machine Learning Algorithms
Authors: Xidong Wu, Preston Brazzle, Stephen Cahoon
Abstract summary: Machine learning models have achieved remarkable success in various real-world applications. Model training in machine learning requires large-scale data sets and multiple iterations before it can work properly. Parallelization of training algorithms is a common strategy to speed up the process of training.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning models have achieved remarkable success in various real-world applications such as data science, computer vision, and natural language processing. However, model training in machine learning requires large-scale data sets and multiple iterations before it can work properly. Parallelization of training algorithms is a common strategy to speed up the process of training. However, many studies on model training and inference focus only on aspects of performance. Power consumption is also an important metric for any type of computation, especially high-performance applications. Machine learning algorithms that can be used on low-power platforms such as sensors and mobile devices have been researched, but less power optimization is done for algorithms designed for high-performance computing. In this paper, we present a C++ implementation of logistic regression and the genetic algorithm, and a Python implementation of neural networks with stochastic gradient descent (SGD) algorithm on classification tasks. We will show the impact that the complexity of the model and the size of the training data have on the parallel efficiency of the algorithm in terms of both power and performance. We also tested these implementations using shard-memory parallelism, distributed memory parallelism, and GPU acceleration to speed up machine learning model training.

Related papers

An Efficient Training Algorithm for Models with Block-wise Sparsity [6.882042556551613]
We propose an efficient training algorithm to decrease both computation and memory costs during training and inference. Our algorithms can decrease the computation and memory costs significantly without a performance drop compared to baselines.
arXiv Detail & Related papers (2025-03-27T19:14:27Z)
Towards provably efficient quantum algorithms for large-scale machine-learning models [11.440134080370811]
We show that fault-tolerant quantum computing could possibly provide provably efficient resolutions for generic (stochastic) gradient descent algorithms. We benchmark instances of large machine learning models from 7 million to 103 million parameters.
arXiv Detail & Related papers (2023-03-06T19:00:27Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Benchmarking Learning Efficiency in Deep Reservoir Computing [23.753943709362794]
We introduce a benchmark of increasingly difficult tasks together with a data efficiency metric to measure how quickly machine learning models learn from training data. We compare the learning speed of some established sequential supervised models, such as RNNs, LSTMs, or Transformers, with relatively less known alternative models based on reservoir computing.
arXiv Detail & Related papers (2022-09-29T08:16:52Z)
Machine Learning Training on a Real Processing-in-Memory System [9.286176889576996]
Training machine learning algorithms is a computationally intensive process, which is frequently memory-bound. Memory-centric computing systems with processing-in-memory capabilities can alleviate this data movement bottleneck. Our work is the first one to evaluate training of machine learning algorithms on a real-world general-purpose PIM architecture.
arXiv Detail & Related papers (2022-06-13T10:20:23Z)
Benchmarking Processor Performance by Multi-Threaded Machine Learning Algorithms [0.0]
In this paper, I will make a performance comparison of multi-threaded machine learning clustering algorithms. I will be working on Linear Regression, Random Forest, and K-Nearest Neighbors to determine the performance characteristics of the algorithms.
arXiv Detail & Related papers (2021-09-11T13:26:58Z)
Towards Efficient and Scalable Acceleration of Online Decision Tree Learning on FPGA [20.487660974785943]
In the era of big data, traditional decision tree induction algorithms are not suitable for learning large-scale datasets. We introduce a new quantile-based algorithm to improve the induction of the Hoeffding tree, one of the state-of-the-art online learning models. We present a high-performance, hardware-efficient and scalable online decision tree learning system on a field-programmable gate array.
arXiv Detail & Related papers (2020-09-03T03:23:43Z)
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces. We train and validate our approach directly on the Intel NNP-I chip for inference. We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z)
One-step regression and classification with crosspoint resistive memory arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge. One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition. Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z)
Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits [99.59941892183454]
We propose Einsum Networks (EiNets), a novel implementation design for PCs. At their core, EiNets combine a large number of arithmetic operations in a single monolithic einsum-operation. We show that the implementation of Expectation-Maximization (EM) can be simplified for PCs, by leveraging automatic differentiation.
arXiv Detail & Related papers (2020-04-13T23:09:15Z)
Understanding the Effects of Data Parallelism and Sparsity on Neural Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity. Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z)
On Coresets for Support Vector Machines [61.928187390362176]
A coreset is a small, representative subset of the original data points. We show that our algorithm can be used to extend the applicability of any off-the-shelf SVM solver to streaming, distributed, and dynamic data settings.
arXiv Detail & Related papers (2020-02-15T23:25:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.