RTGPU: Real-Time GPU Scheduling of Hard Deadline Parallel Tasks with
Fine-Grain Utilization
- URL: http://arxiv.org/abs/2101.10463v2
- Date: Wed, 27 Jan 2021 02:22:33 GMT
- Title: RTGPU: Real-Time GPU Scheduling of Hard Deadline Parallel Tasks with
Fine-Grain Utilization
- Authors: An Zou, Jing Li, Christopher D. Gill, and Xuan Zhang
- Abstract summary: We propose RTGPU, which can schedule the execution of multiple GPU applications in real-time to meet hard deadlines.
Our approach provides superior schedulability compared with previous work, and gives real-time guarantees to meet hard deadlines for multiple GPU applications.
- Score: 5.02836935036198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many emerging cyber-physical systems, such as autonomous vehicles and robots,
rely heavily on artificial intelligence and machine learning algorithms to
perform important system operations. Since these highly parallel applications
are computationally intensive, they need to be accelerated by graphics
processing units (GPUs) to meet stringent timing constraints. However, despite
the wide adoption of GPUs, efficiently scheduling multiple GPU applications
while providing rigorous real-time guarantees remains a challenge. In this
paper, we propose RTGPU, which can schedule the execution of multiple GPU
applications in real-time to meet hard deadlines. Each GPU application can have
multiple CPU execution and memory copy segments, as well as GPU kernels. We
start with a model to explicitly account for the CPU and memory copy segments
of these applications. We then consider the GPU architecture in the development
of a precise timing model for the GPU kernels and leverage a technique known as
persistent threads to implement fine-grained kernel scheduling with improved
performance through interleaved execution. Next, we propose a general method
for scheduling parallel GPU applications in real time. Finally, to schedule
multiple parallel GPU applications, we propose a practical real-time scheduling
algorithm based on federated scheduling and grid search (for GPU kernel
segments) with uniprocessor fixed priority scheduling (for multiple CPU and
memory copy segments). Our approach provides superior schedulability compared
with previous work, and gives real-time guarantees to meet hard deadlines for
multiple GPU applications according to comprehensive validation and evaluation
on a real NVIDIA GTX1080Ti GPU system.
Related papers
- Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved Offloading [2.8231000588510757]
Transformers and large language models(LLMs) have seen rapid adoption in all domains.
Training of transformers is very expensive and often hits a memory wall''
We propose a novel technique to split the LLM into subgroups, whose update phase is scheduled on either the CPU or the GPU.
arXiv Detail & Related papers (2024-10-26T00:43:59Z) - Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach [1.076745840431781]
We propose a method for comprehensively co-optimizing the setup of hierarchical partitioning and the selection of co-scheduling groups from a given set of jobs.
This results in a maximum throughput improvement by a factor of 1.87 compared to the time-sharing scheduling.
arXiv Detail & Related papers (2024-05-14T16:40:06Z) - FIKIT: Priority-Based Real-time GPU Multi-tasking Scheduling with Kernel
Identification [2.9271819018953162]
In a cloud computing cluster, serving a GPU's computation power through multi-tasks sharing is highly demanded.
Existing GPU sharing solutions focus on reducing task-level waiting time or task-level switching costs when multiple jobs competing for a single GPU.
We present a novel kernel-level scheduling strategy called FIKIT: Filling Inter- Kernel Idle Time.
arXiv Detail & Related papers (2023-11-17T07:25:18Z) - Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels.
We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion.
We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z) - PLSSVM: A (multi-)GPGPU-accelerated Least Squares Support Vector Machine [68.8204255655161]
Support Vector Machines (SVMs) are widely used in machine learning.
However, even modern and optimized implementations do not scale well for large non-trivial dense data sets on cutting-edge hardware.
PLSSVM can be used as a drop-in replacement for an LVM.
arXiv Detail & Related papers (2022-02-25T13:24:23Z) - Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous
Multi-GPU Servers [65.60007071024629]
We show that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
We show experimentally that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
arXiv Detail & Related papers (2021-10-13T20:58:15Z) - Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning [7.43260596107574]
We propose Nimble, a deep learning (DL) execution engine that runs tasks in parallel with minimal scheduling overhead.
Nable automatically parallelizes the execution of GPU tasks by exploiting multiple GPU streams in a single GPU.
evaluation on a variety of neural networks shows that compared to PyTorch, Nimble speeds up inference and training by up to 22.34$times$ and 3.61$times$, respectively.
arXiv Detail & Related papers (2020-12-04T17:25:46Z) - Systolic Computing on GPUs for Productive Performance [2.8064596842326575]
We propose a language and compiler to productively build high-performance systolic arrays that run on GPUs.
A programmer it' specifies a projection of a dataflow compute onto a linear systolic array, while leaving the detailed implementation of the projection to a compiler.
The compiler implements the specified projection and maps the linear systolic array to the SIMD execution units and vector registers of GPUs.
arXiv Detail & Related papers (2020-10-29T18:49:54Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z) - MPLP++: Fast, Parallel Dual Block-Coordinate Ascent for Dense Graphical
Models [96.1052289276254]
This work introduces a new MAP-solver, based on the popular Dual Block-Coordinate Ascent principle.
Surprisingly, by making a small change to the low-performing solver, we derive the new solver MPLP++ that significantly outperforms all existing solvers by a large margin.
arXiv Detail & Related papers (2020-04-16T16:20:53Z) - Efficient Video Semantic Segmentation with Labels Propagation and
Refinement [138.55845680523908]
This paper tackles the problem of real-time semantic segmentation of high definition videos using a hybrid GPU / CPU approach.
We propose an Efficient Video(EVS) pipeline that combines: (i) On the CPU, a very fast optical flow method, that is used to exploit the temporal aspect of the video and propagate semantic information from one frame to the next.
On the popular Cityscapes dataset with high resolution frames (2048 x 1024), the proposed operating points range from 80 to 1000 Hz on a single GPU and CPU.
arXiv Detail & Related papers (2019-12-26T11:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.