Time-Based Roofline for Deep Learning Performance Analysis
- URL: http://arxiv.org/abs/2009.04598v3
- Date: Tue, 22 Sep 2020 21:51:45 GMT
- Title: Time-Based Roofline for Deep Learning Performance Analysis
- Authors: Yunsong Wang, Charlene Yang, Steven Farrell, Yan Zhang, Thorsten
Kurth, Samuel Williams
- Abstract summary: Roofline-based approach to performance analysis to facilitate the optimization of deep learning applications.
We take two sets of representative kernels, 2D convolution and long short-term memory, to validate and demonstrate the use of this new approach.
Compared to the common ad-hoc approach, this study helps form a more systematic way to analyze code performance.
- Score: 2.547058931949976
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning applications are usually very compute-intensive and require a
long run time for training and inference. This has been tackled by researchers
from both hardware and software sides, and in this paper, we propose a
Roofline-based approach to performance analysis to facilitate the optimization
of these applications. This approach is an extension of the Roofline model
widely used in traditional high-performance computing applications, and it
incorporates both compute/bandwidth complexity and run time in its formulae to
provide insights into deep learning-specific characteristics. We take two sets
of representative kernels, 2D convolution and long short-term memory, to
validate and demonstrate the use of this new approach, and investigate how
arithmetic intensity, cache locality, auto-tuning, kernel launch overhead, and
Tensor Core usage can affect performance. Compared to the common ad-hoc
approach, this study helps form a more systematic way to analyze code
performance and identify optimization opportunities for deep learning
applications.
Related papers
- Inference Scaling for Long-Context Retrieval Augmented Generation [37.15479223789199]
In this work, we investigate inference scaling for retrieval augmented generation (RAG)
We focus on two inference scaling strategies: in-context learning and iterative prompting.
We demonstrate that scaling inference compute on long-context large language models achieves up to 58.9% gains on benchmark datasets.
arXiv Detail & Related papers (2024-10-06T03:42:15Z) - Self-STORM: Deep Unrolled Self-Supervised Learning for Super-Resolution Microscopy [55.2480439325792]
We introduce deep unrolled self-supervised learning, which alleviates the need for such data by training a sequence-specific, model-based autoencoder.
Our proposed method exceeds the performance of its supervised counterparts.
arXiv Detail & Related papers (2024-03-25T17:40:32Z) - Performance Tuning for GPU-Embedded Systems: Machine-Learning-based and
Analytical Model-driven Tuning Methodologies [0.0]
The study introduces an analytical model-driven tuning methodology and a Machine Learning (ML)-based tuning methodology.
We evaluate the performance of the two tuning methodologies for different parallel prefix implementations of the BPLG library in an NVIDIA Jetson system.
arXiv Detail & Related papers (2023-10-24T22:09:03Z) - Computation-efficient Deep Learning for Computer Vision: A Survey [121.84121397440337]
Deep learning models have reached or even exceeded human-level performance in a range of visual perception tasks.
Deep learning models usually demand significant computational resources, leading to impractical power consumption, latency, or carbon emissions in real-world scenarios.
New research focus is computationally efficient deep learning, which strives to achieve satisfactory performance while minimizing the computational cost during inference.
arXiv Detail & Related papers (2023-08-27T03:55:28Z) - Towards Constituting Mathematical Structures for Learning to Optimize [101.80359461134087]
A technique that utilizes machine learning to learn an optimization algorithm automatically from data has gained arising attention in recent years.
A generic L2O approach parameterizes the iterative update rule and learns the update direction as a black-box network.
While the generic approach is widely applicable, the learned model can overfit and may not generalize well to out-of-distribution test sets.
We propose a novel L2O model with a mathematics-inspired structure that is broadly applicable and generalized well to out-of-distribution problems.
arXiv Detail & Related papers (2023-05-29T19:37:28Z) - Deep reinforcement learning applied to an assembly sequence planning
problem with user preferences [1.0558951653323283]
We propose an approach to the implementation of DRL methods in assembly sequence planning problems.
The proposed approach introduces in the RL environment parametric actions to improve training time and sample efficiency.
The results support the potential for the application of deep reinforcement learning in assembly sequence planning problems with human interaction.
arXiv Detail & Related papers (2023-04-13T14:25:15Z) - Model-Based Deep Learning: On the Intersection of Deep Learning and
Optimization [101.32332941117271]
Decision making algorithms are used in a multitude of different applications.
Deep learning approaches that use highly parametric architectures tuned from data without relying on mathematical models are becoming increasingly popular.
Model-based optimization and data-centric deep learning are often considered to be distinct disciplines.
arXiv Detail & Related papers (2022-05-05T13:40:08Z) - Self-Attention Neural Bag-of-Features [103.70855797025689]
We build on the recently introduced 2D-Attention and reformulate the attention learning methodology.
We propose a joint feature-temporal attention mechanism that learns a joint 2D attention mask highlighting relevant information.
arXiv Detail & Related papers (2022-01-26T17:54:14Z) - Comparative Code Structure Analysis using Deep Learning for Performance
Prediction [18.226950022938954]
This paper aims to assess the feasibility of using purely static information (e.g., abstract syntax tree or AST) of applications to predict performance change based on the change in code structure.
Our evaluations of several deep embedding learning methods demonstrate that tree-based Long Short-Term Memory (LSTM) models can leverage the hierarchical structure of source-code to discover latent representations and achieve up to 84% (individual problem) and 73% (combined dataset with multiple of problems) accuracy in predicting the change in performance.
arXiv Detail & Related papers (2021-02-12T16:59:12Z) - Hierarchical Roofline Performance Analysis for Deep Learning
Applications [0.06999740786886534]
This paper presents a practical methodology for collecting performance data necessary to conduct hierarchical Roofline analysis on NVIDIA GPUs.
It discusses the extension of the Empirical Roofline Toolkit for broader support of a range of data precisions and Core support and introduces a Nsight Compute based method to accurately collect application performance information.
arXiv Detail & Related papers (2020-09-11T07:16:55Z) - A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.