DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML
Workloads
- URL: http://arxiv.org/abs/2212.03414v2
- Date: Thu, 21 Sep 2023 00:24:09 GMT
- Title: DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML
Workloads
- Authors: Seah Kim, Hyoukjun Kwon, Jinook Song, Jihyuck Jo, Yu-Hsin Chen,
Liangzhen Lai, Vikas Chandra
- Abstract summary: We propose a new scheduler, DREAM, which effectively handles various dynamicity in RTMM workloads.
DREAM quantifies the unique requirements for RTMM workloads and utilizes the scores quantified to drive scheduling decisions.
In our evaluation of five scenarios of RTMM workload, DREAM reduces the overall UXCost by 32.2% and 50.0% in the mean geometric (up to 80.8% and 97.6%) compared to state-of-the-art baselines.
- Score: 8.266680870089997
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Emerging real-time multi-model ML (RTMM) workloads such as AR/VR and drone
control involve dynamic behaviors in various granularity; task, model, and
layers within a model. Such dynamic behaviors introduce new challenges to the
system software in an ML system since the overall system load is not completely
predictable, unlike traditional ML workloads. In addition, RTMM workloads
require real-time processing, involve highly heterogeneous models, and target
resource-constrained devices. Under such circumstances, developing an effective
scheduler gains more importance to better utilize underlying hardware
considering the unique characteristics of RTMM workloads. Therefore, we propose
a new scheduler, DREAM, which effectively handles various dynamicity in RTMM
workloads targeting multi-accelerator systems. DREAM quantifies the unique
requirements for RTMM workloads and utilizes the quantified scores to drive
scheduling decisions, considering the current system load and other inference
jobs on different models and input frames. DREAM utilizes tunable parameters
that provide fast and effective adaptivity to dynamic workload changes. In our
evaluation of five scenarios of RTMM workload, DREAM reduces the overall
UXCost, which is an equivalent metric of the energy-delay product (EDP) for
RTMM defined in the paper, by 32.2% and 50.0% in the geometric mean (up to
80.8% and 97.6%) compared to state-of-the-art baselines, which shows the
efficacy of our scheduling methodology.
Related papers
- SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators [12.416683044819955]
Multi-model workloads with heavy models like recent large language models significantly increased the compute and memory demands on hardware.
To address such increasing demands, designing a scalable hardware architecture became a key problem.
We develop a set of schedulers to navigate the huge scheduling space and codify them into a scheduler with advanced techniques such as inter-chiplet pipelining.
arXiv Detail & Related papers (2024-05-01T18:02:25Z) - DynaMMo: Dynamic Model Merging for Efficient Class Incremental Learning for Medical Images [0.8213829427624407]
Continual learning, the ability to acquire knowledge from new data while retaining previously learned information, is a fundamental challenge in machine learning.
We propose Dynamic Model Merging, DynaMMo, a method that merges multiple networks at different stages of model training to achieve better computational efficiency.
We evaluate DynaMMo on three publicly available datasets, demonstrating its effectiveness compared to existing approaches.
arXiv Detail & Related papers (2024-04-22T11:37:35Z) - Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse
Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices.
We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling.
Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z) - RED: A Systematic Real-Time Scheduling Approach for Robotic
Environmental Dynamics [11.38746414146899]
We introduce RED, a systematic real-time scheduling approach designed to support multi-task deep neural network workloads in resource-limited robotic systems.
It is designed to adaptively manage the Robotic Environmental Dynamics (RED) while adhering to real-time constraints.
arXiv Detail & Related papers (2023-08-29T15:04:08Z) - Asynchronous Multi-Model Dynamic Federated Learning over Wireless
Networks: Theory, Modeling, and Optimization [20.741776617129208]
Federated learning (FL) has emerged as a key technique for distributed machine learning (ML)
We first formulate rectangular scheduling steps and functions to capture the impact of system parameters on learning performance.
Our analysis sheds light on the joint impact of device training variables and asynchronous scheduling decisions.
arXiv Detail & Related papers (2023-05-22T21:39:38Z) - XRBench: An Extended Reality (XR) Machine Learning Benchmark Suite for
the Metaverse [18.12263246913058]
Real-time multi-task multi-model (MTMM) workloads are emerging for applications areas like extended reality (XR) to support metaverse use cases.
These workloads combine user interactivity with computationally complex machine learning (ML) activities.
These workloads present unique difficulties and constraints.
arXiv Detail & Related papers (2022-11-16T05:08:42Z) - M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task
Learning with Model-Accelerator Co-design [95.41238363769892]
Multi-task learning (MTL) encapsulates multiple learned tasks in a single model and often lets those tasks learn better jointly.
Current MTL regimes have to activate nearly the entire model even to just execute a single task.
We present a model-accelerator co-design framework to enable efficient on-device MTL.
arXiv Detail & Related papers (2022-10-26T15:40:24Z) - Attentional Mixtures of Soft Prompt Tuning for Parameter-efficient
Multi-task Knowledge Sharing [53.399742232323895]
ATTEMPT is a new modular, multi-task, and parameter-efficient language model (LM) tuning approach.
It combines knowledge transferred across different tasks via a mixture of soft prompts while keeping original LM unchanged.
It is parameter-efficient (e.g., updates 1,600 times fewer parameters than fine-tuning) and enables multi-task learning and flexible extensions.
arXiv Detail & Related papers (2022-05-24T10:48:33Z) - Controllable Dynamic Multi-Task Architectures [92.74372912009127]
We propose a controllable multi-task network that dynamically adjusts its architecture and weights to match the desired task preference as well as the resource constraints.
We propose a disentangled training of two hypernetworks, by exploiting task affinity and a novel branching regularized loss, to take input preferences and accordingly predict tree-structured models with adapted weights.
arXiv Detail & Related papers (2022-03-28T17:56:40Z) - Real-time Neural-MPC: Deep Learning Model Predictive Control for
Quadrotors and Agile Robotic Platforms [59.03426963238452]
We present Real-time Neural MPC, a framework to efficiently integrate large, complex neural network architectures as dynamics models within a model-predictive control pipeline.
We show the feasibility of our framework on real-world problems by reducing the positional tracking error by up to 82% when compared to state-of-the-art MPC approaches without neural network dynamics.
arXiv Detail & Related papers (2022-03-15T09:38:15Z) - Controllable Pareto Multi-Task Learning [55.945680594691076]
A multi-task learning system aims at solving multiple related tasks at the same time.
With a fixed model capacity, the tasks would be conflicted with each other, and the system usually has to make a trade-off among learning all of them together.
This work proposes a novel controllable multi-task learning framework, to enable the system to make real-time trade-off control among different tasks with a single model.
arXiv Detail & Related papers (2020-10-13T11:53:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.