BB-ML: Basic Block Performance Prediction using Machine Learning
Techniques
- URL: http://arxiv.org/abs/2202.07798v3
- Date: Sun, 12 Nov 2023 04:13:50 GMT
- Title: BB-ML: Basic Block Performance Prediction using Machine Learning
Techniques
- Authors: Hamdy Abdelkhalik, Shamminuj Aktar, Yehia Arafa, Atanu Barai, Gopinath
Chennupati, Nandakishore Santhi, Nishant Panda, Nirmal Prajapati, Nazmul
Haque Turja, Stephan Eidenbenz and Abdel-Hameed Badawy
- Abstract summary: We propose to use Machine Learning (ML) techniques for performance prediction at a much finer granularity, namely at the Basic Block (BB) level.
We extrapolate the basic block execution counts of GPU applications and use them for predicting the performance for large input sizes from the counts of smaller input sizes.
We achieve an accuracy 93.5% in extrapolating the basic block counts for large input sets when trained on smaller input sets.
- Score: 0.6020800302423842
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent years have seen the adoption of Machine Learning (ML) techniques to
predict the performance of large-scale applications, mostly at a coarse level.
In contrast, we propose to use ML techniques for performance prediction at a
much finer granularity, namely at the Basic Block (BB) level, which are single
entry, single exit code blocks that are used for analysis by the compilers to
break down a large code into manageable pieces. We extrapolate the basic block
execution counts of GPU applications and use them for predicting the
performance for large input sizes from the counts of smaller input sizes. We
train a Poisson Neural Network (PNN) model using random input values as well as
the lowest input values of the application to learn the relationship between
inputs and basic block counts. Experimental results show that the model can
accurately predict the basic block execution counts of 16 GPU benchmarks. We
achieve an accuracy of 93.5% in extrapolating the basic block counts for large
input sets when trained on smaller input sets and an accuracy of 97.7% in
predicting basic block counts on random instances. In a case study, we apply
the ML model to CUDA GPU benchmarks for performance prediction across a
spectrum of applications. We use a variety of metrics for evaluation, including
global memory requests and the active cycles of tensor cores, ALU, and FMA
units. Results demonstrate the model's capability of predicting the performance
of large datasets with an average error rate of 0.85% and 0.17% for global and
shared memory requests, respectively. Additionally, to address the utilization
of the main functional units in Ampere architecture GPUs, we calculate the
active cycles for tensor cores, ALU, FMA, and FP64 units and achieve an average
error of 2.3% and 10.66% for ALU and FMA units while the maximum observed error
across all tested applications and units reaches 18.5%.
Related papers
- Scaling Laws for Predicting Downstream Performance in LLMs [75.28559015477137]
This work focuses on the pre-training loss as a more-efficient metric for performance estimation.
We extend the power law analytical function to predict domain-specific pre-training loss based on FLOPs across data sources.
We employ a two-layer neural network to model the non-linear relationship between multiple domain-specific loss and downstream performance.
arXiv Detail & Related papers (2024-10-11T04:57:48Z) - EfficientQAT: Efficient Quantization-Aware Training for Large Language Models [50.525259103219256]
quantization-aware training (QAT) offers a solution by reducing memory consumption through low-bit representations with minimal accuracy loss.
We propose Efficient Quantization-Aware Training (EfficientQAT), a more feasible QAT algorithm.
EfficientQAT involves two consecutive phases: Block-wise training of all parameters (Block-AP) and end-to-end training of quantization parameters (E2E-QP)
arXiv Detail & Related papers (2024-07-10T17:53:30Z) - Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment [56.44025052765861]
Large language models (LLMs) have revolutionized Natural Language Processing (NLP), but their size creates computational bottlenecks.
We introduce a novel approach to create accurate, sparse foundational versions of performant LLMs.
We show a total speedup on CPUs for sparse-quantized LLaMA models of up to 8.6x.
arXiv Detail & Related papers (2024-05-06T16:03:32Z) - Investigating Resource-efficient Neutron/Gamma Classification ML Models Targeting eFPGAs [0.0]
Open-source embedded FPGA (eFPGA) frameworks provide an alternate, more flexible pathway for implementing machine learning models in hardware.
We explore the parameter space for eFPGA implementations of fully-connected neural network (fcNN) and boosted decision tree (BDT) models.
The results of the study will be used to aid the specification of an eFPGA fabric, which will be integrated as part of a test chip.
arXiv Detail & Related papers (2024-04-19T20:03:30Z) - How predictable is language model benchmark performance? [0.07143413923310668]
We show that average benchmark performance, aggregating over many individual tasks, is decently predictable as a function of training compute scale.
Individual task performance remains significantly more predictable than chance.
arXiv Detail & Related papers (2024-01-09T17:34:30Z) - DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures
using Lookup Tables [49.965024476651706]
DeepGEMM is a lookup table based approach for the execution of ultra low-precision convolutional neural networks on SIMD hardware.
Our implementation outperforms corresponding 8-bit integer kernels by up to 1.74x on x86 platforms.
arXiv Detail & Related papers (2023-04-18T15:13:10Z) - A Meta-Learning Approach to Predicting Performance and Data Requirements [163.4412093478316]
We propose an approach to estimate the number of samples required for a model to reach a target performance.
We find that the power law, the de facto principle to estimate model performance, leads to large error when using a small dataset.
We introduce a novel piecewise power law (PPL) that handles the two data differently.
arXiv Detail & Related papers (2023-03-02T21:48:22Z) - A contextual analysis of multi-layer perceptron models in classifying
hand-written digits and letters: limited resources [0.0]
We extensively test an end-to-end vanilla neural network (MLP) approach in pure numpy without any pre-processing or feature extraction done beforehand.
We show that basic data mining operations can significantly improve the performance of the models in terms of computational time.
arXiv Detail & Related papers (2021-07-05T04:30:37Z) - Semiring Primitives for Sparse Neighborhood Methods on the GPU [16.56995698312561]
We show that a sparse semiring primitive can be flexible enough to support a wide range of critical distance measures.
This primitive is a foundational component for enabling many neighborhood-based information retrieval and machine learning algorithms to accept sparse input.
arXiv Detail & Related papers (2021-04-13T17:05:03Z) - Real-Time Execution of Large-scale Language Models on Mobile [49.32610509282623]
We find the best model structure of BERT for a given computation size to match specific devices.
Our framework can guarantee the identified model to meet both resource and real-time specifications of mobile devices.
Specifically, our model is 5.2x faster on CPU and 4.1x faster on GPU with 0.5-2% accuracy loss compared with BERT-base.
arXiv Detail & Related papers (2020-09-15T01:59:17Z) - A Simple Model for Portable and Fast Prediction of Execution Time and
Power Consumption of GPU Kernels [2.9853894456071077]
This model is built based on random forests using 189 individual compute kernels from benchmarks such as Parboil, Rodinia, Polybench-GPU and SHOC.
Evaluation of the model performance using cross-validation yields a median Mean Average Percentage Error (MAPE) of 8.86-52.00% and 1.84-2.94%, for time respectively power prediction across five different GPUs, while latency for a single prediction varies between 15 and 108 milliseconds.
arXiv Detail & Related papers (2020-01-20T13:40:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.