A Study on the Intersection of GPU Utilization and CNN Inference
- URL: http://arxiv.org/abs/2212.07936v1
- Date: Thu, 15 Dec 2022 16:11:40 GMT
- Title: A Study on the Intersection of GPU Utilization and CNN Inference
- Authors: Jack Kosaian, Amar Phanishayee
- Abstract summary: We show that there is room to improve the inference-time GPU utilization of convolutional neural network (CNN) inference.
Our study makes the case that there is room to improve the inference-time GPU utilization of CNNs and that knowledge of GPU utilization has the potential to benefit even applications that do not target utilization itself.
- Score: 8.084016058894779
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: There has been significant progress in developing neural network
architectures that both achieve high predictive performance and that also
achieve high application-level inference throughput (e.g., frames per second).
Another metric of increasing importance is GPU utilization during inference:
the measurement of how well a deployed neural network uses the computational
capabilities of the GPU on which it runs. Achieving high GPU utilization is
critical to increasing application-level throughput and ensuring a good return
on investment for deploying GPUs.
This paper analyzes the GPU utilization of convolutional neural network (CNN)
inference. We first survey the GPU utilization of CNNs to show that there is
room to improve the GPU utilization of many of these CNNs. We then investigate
the GPU utilization of networks within a neural architecture search (NAS)
search space, and explore how using GPU utilization as a metric could
potentially be used to accelerate NAS itself. Our study makes the case that
there is room to improve the inference-time GPU utilization of CNNs and that
knowledge of GPU utilization has the potential to benefit even applications
that do not target utilization itself. We hope that the results of this study
will spur future innovation in designing GPU-efficient neural networks.
Related papers
- Benchmarking GPUs on SVBRDF Extractor Model [0.0]
In this work, we try to differentiate the performance of different GPUs on neural network models that operate on bigger input images (256x256)
In this work, we tried to differentiate the performance of different GPUs on neural network models that operate on bigger input images (256x256)
arXiv Detail & Related papers (2023-10-19T17:09:06Z) - Transferability of Convolutional Neural Networks in Stationary Learning
Tasks [96.00428692404354]
We introduce a novel framework for efficient training of convolutional neural networks (CNNs) for large-scale spatial problems.
We show that a CNN trained on small windows of such signals achieves a nearly performance on much larger windows without retraining.
Our results show that the CNN is able to tackle problems with many hundreds of agents after being trained with fewer than ten.
arXiv Detail & Related papers (2023-07-21T13:51:45Z) - Quiver: Supporting GPUs for Low-Latency, High-Throughput GNN Serving
with Workload Awareness [4.8412870364335925]
Quiver is a distributed GPU-based GNN serving system with low-latency and high- throughput.
We show that Quiver achieves up to 35 times lower latency with an 8 times higher throughput compared to state-of-the-art GNN approaches.
arXiv Detail & Related papers (2023-05-18T10:34:23Z) - Architectural Implications of Embedding Dimension during GCN on CPU and
GPU [6.650945912906685]
Graph Convolutional Networks (GCNs) are a widely used type of GNN for transductive graph learning problems.
GCN is a challenging algorithm from an architecture perspective due to inherent sparsity, low data reuse, and massive memory capacity requirements.
arXiv Detail & Related papers (2022-12-01T19:23:12Z) - Survey on Large Scale Neural Network Training [48.424512364338746]
Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training.
This survey provides a systematic overview of the approaches that enable more efficient DNNs training.
arXiv Detail & Related papers (2022-02-21T18:48:02Z) - Accelerating Training and Inference of Graph Neural Networks with Fast
Sampling and Pipelining [58.10436813430554]
Mini-batch training of graph neural networks (GNNs) requires a lot of computation and data movement.
We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment.
We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler.
We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised.
arXiv Detail & Related papers (2021-10-16T02:41:35Z) - L2PF -- Learning to Prune Faster [57.32153461504626]
We present a multi-task, try-and-learn method, discretely learning redundant filters of the CNN and a continuous action of how long the layers have to be fine-tuned.
For ResNet20, we have achieved a compression ratio of 3.84 x with minimal accuracy degradation.
Compared to the state-of-the-art pruning method, we reduced the GPU hours by 1.71 x.
arXiv Detail & Related papers (2021-01-07T18:13:37Z) - At-Scale Sparse Deep Neural Network Inference with Efficient GPU
Implementation [24.824295164938604]
This paper presents GPU performance optimization and scaling results for inference models of the Sparse Deep Neural Network Challenge 2020.
Sparse deep neural networks (SpDNN) have shown promise for reining in the memory footprint of large neural networks.
This work presents optimized sparse matrix multiplication kernels fused with the ReLU function.
arXiv Detail & Related papers (2020-07-28T12:09:43Z) - Optimizing Memory Placement using Evolutionary Graph Reinforcement
Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces.
We train and validate our approach directly on the Intel NNP-I chip for inference.
We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z) - Hybrid Models for Learning to Branch [81.93868699246214]
We propose a new hybrid architecture for efficient branching on CPU machines.
The proposed architecture combines the expressive power of GNNs with computationally inexpensive multi-layer perceptrons (MLP) for branching.
arXiv Detail & Related papers (2020-06-26T21:03:45Z) - Neural Architecture Design for GPU-Efficient Networks [27.07089149328155]
We propose a general principle for designing GPU-efficient networks based on extensive empirical studies.
Based on the proposed framework, we design a family of GPU-Efficient Networks, or GENets in short.
While achieving $geq 81.3%$ top-1 accuracy on ImageNet, GENet is up to $6.4$ times faster than EfficienNet on GPU.
arXiv Detail & Related papers (2020-06-24T22:42:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.