DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML
Workloads
- URL: http://arxiv.org/abs/2212.03414v2
- Date: Thu, 21 Sep 2023 00:24:09 GMT
- Title: DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML
Workloads
- Authors: Seah Kim, Hyoukjun Kwon, Jinook Song, Jihyuck Jo, Yu-Hsin Chen,
Liangzhen Lai, Vikas Chandra
- Abstract summary: We propose a new scheduler, DREAM, which effectively handles various dynamicity in RTMM workloads.
DREAM quantifies the unique requirements for RTMM workloads and utilizes the scores quantified to drive scheduling decisions.
In our evaluation of five scenarios of RTMM workload, DREAM reduces the overall UXCost by 32.2% and 50.0% in the mean geometric (up to 80.8% and 97.6%) compared to state-of-the-art baselines.
- Score: 8.266680870089997
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Emerging real-time multi-model ML (RTMM) workloads such as AR/VR and drone
control involve dynamic behaviors in various granularity; task, model, and
layers within a model. Such dynamic behaviors introduce new challenges to the
system software in an ML system since the overall system load is not completely
predictable, unlike traditional ML workloads. In addition, RTMM workloads
require real-time processing, involve highly heterogeneous models, and target
resource-constrained devices. Under such circumstances, developing an effective
scheduler gains more importance to better utilize underlying hardware
considering the unique characteristics of RTMM workloads. Therefore, we propose
a new scheduler, DREAM, which effectively handles various dynamicity in RTMM
workloads targeting multi-accelerator systems. DREAM quantifies the unique
requirements for RTMM workloads and utilizes the quantified scores to drive
scheduling decisions, considering the current system load and other inference
jobs on different models and input frames. DREAM utilizes tunable parameters
that provide fast and effective adaptivity to dynamic workload changes. In our
evaluation of five scenarios of RTMM workload, DREAM reduces the overall
UXCost, which is an equivalent metric of the energy-delay product (EDP) for
RTMM defined in the paper, by 32.2% and 50.0% in the geometric mean (up to
80.8% and 97.6%) compared to state-of-the-art baselines, which shows the
efficacy of our scheduling methodology.
Related papers
- DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution [114.61347672265076]
Development of MLLMs for real-world robots is challenging due to the typically limited computation and memory capacities available on robotic platforms.
We propose a Dynamic Early-Exit Framework for Robotic Vision-Language-Action Model (DeeR) that automatically adjusts the size of the activated MLLM.
DeeR demonstrates significant reductions in computational costs of LLM by 5.2-6.5x and GPU memory of LLM by 2-6x without compromising performance.
arXiv Detail & Related papers (2024-11-04T18:26:08Z) - Smart energy management: process structure-based hybrid neural networks for optimal scheduling and economic predictive control in integrated systems [2.723192806018494]
Integrated energy systems (IESs) are complex systems consisting of diverse operating units spanning multiple domains.
We propose a physics-informed hybrid time-series neural network (NN) surrogate to predict the dynamic performance of IESs across multiple time scales.
arXiv Detail & Related papers (2024-10-07T04:24:39Z) - SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators [12.416683044819955]
Multi-model workloads with heavy models like recent large language models significantly increased the compute and memory demands on hardware.
To address such increasing demands, designing a scalable hardware architecture became a key problem.
We develop a set of schedulers to navigate the huge scheduling space and codify them into a scheduler, SCAR, with advanced techniques such as inter-chiplet pipelining.
arXiv Detail & Related papers (2024-05-01T18:02:25Z) - DynaMMo: Dynamic Model Merging for Efficient Class Incremental Learning for Medical Images [0.8213829427624407]
Continual learning, the ability to acquire knowledge from new data while retaining previously learned information, is a fundamental challenge in machine learning.
We propose Dynamic Model Merging, DynaMMo, a method that merges multiple networks at different stages of model training to achieve better computational efficiency.
We evaluate DynaMMo on three publicly available datasets, demonstrating its effectiveness compared to existing approaches.
arXiv Detail & Related papers (2024-04-22T11:37:35Z) - Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse
Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices.
We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling.
Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z) - RED: A Systematic Real-Time Scheduling Approach for Robotic
Environmental Dynamics [11.38746414146899]
We introduce RED, a systematic real-time scheduling approach designed to support multi-task deep neural network workloads in resource-limited robotic systems.
It is designed to adaptively manage the Robotic Environmental Dynamics (RED) while adhering to real-time constraints.
arXiv Detail & Related papers (2023-08-29T15:04:08Z) - Asynchronous Multi-Model Dynamic Federated Learning over Wireless
Networks: Theory, Modeling, and Optimization [20.741776617129208]
Federated learning (FL) has emerged as a key technique for distributed machine learning (ML)
We first formulate rectangular scheduling steps and functions to capture the impact of system parameters on learning performance.
Our analysis sheds light on the joint impact of device training variables and asynchronous scheduling decisions.
arXiv Detail & Related papers (2023-05-22T21:39:38Z) - M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task
Learning with Model-Accelerator Co-design [95.41238363769892]
Multi-task learning (MTL) encapsulates multiple learned tasks in a single model and often lets those tasks learn better jointly.
Current MTL regimes have to activate nearly the entire model even to just execute a single task.
We present a model-accelerator co-design framework to enable efficient on-device MTL.
arXiv Detail & Related papers (2022-10-26T15:40:24Z) - Attentional Mixtures of Soft Prompt Tuning for Parameter-efficient
Multi-task Knowledge Sharing [53.399742232323895]
ATTEMPT is a new modular, multi-task, and parameter-efficient language model (LM) tuning approach.
It combines knowledge transferred across different tasks via a mixture of soft prompts while keeping original LM unchanged.
It is parameter-efficient (e.g., updates 1,600 times fewer parameters than fine-tuning) and enables multi-task learning and flexible extensions.
arXiv Detail & Related papers (2022-05-24T10:48:33Z) - Controllable Dynamic Multi-Task Architectures [92.74372912009127]
We propose a controllable multi-task network that dynamically adjusts its architecture and weights to match the desired task preference as well as the resource constraints.
We propose a disentangled training of two hypernetworks, by exploiting task affinity and a novel branching regularized loss, to take input preferences and accordingly predict tree-structured models with adapted weights.
arXiv Detail & Related papers (2022-03-28T17:56:40Z) - Real-time Neural-MPC: Deep Learning Model Predictive Control for
Quadrotors and Agile Robotic Platforms [59.03426963238452]
We present Real-time Neural MPC, a framework to efficiently integrate large, complex neural network architectures as dynamics models within a model-predictive control pipeline.
We show the feasibility of our framework on real-world problems by reducing the positional tracking error by up to 82% when compared to state-of-the-art MPC approaches without neural network dynamics.
arXiv Detail & Related papers (2022-03-15T09:38:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.