Related papers: DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads

DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads

URL: http://arxiv.org/abs/2212.03414v2
Date: Thu, 21 Sep 2023 00:24:09 GMT
Title: DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads
Authors: Seah Kim, Hyoukjun Kwon, Jinook Song, Jihyuck Jo, Yu-Hsin Chen, Liangzhen Lai, Vikas Chandra
Abstract summary: We propose a new scheduler, DREAM, which effectively handles various dynamicity in RTMM workloads. DREAM quantifies the unique requirements for RTMM workloads and utilizes the scores quantified to drive scheduling decisions. In our evaluation of five scenarios of RTMM workload, DREAM reduces the overall UXCost by 32.2% and 50.0% in the mean geometric (up to 80.8% and 97.6%) compared to state-of-the-art baselines.
Score: 8.266680870089997
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Emerging real-time multi-model ML (RTMM) workloads such as AR/VR and drone control involve dynamic behaviors in various granularity; task, model, and layers within a model. Such dynamic behaviors introduce new challenges to the system software in an ML system since the overall system load is not completely predictable, unlike traditional ML workloads. In addition, RTMM workloads require real-time processing, involve highly heterogeneous models, and target resource-constrained devices. Under such circumstances, developing an effective scheduler gains more importance to better utilize underlying hardware considering the unique characteristics of RTMM workloads. Therefore, we propose a new scheduler, DREAM, which effectively handles various dynamicity in RTMM workloads targeting multi-accelerator systems. DREAM quantifies the unique requirements for RTMM workloads and utilizes the quantified scores to drive scheduling decisions, considering the current system load and other inference jobs on different models and input frames. DREAM utilizes tunable parameters that provide fast and effective adaptivity to dynamic workload changes. In our evaluation of five scenarios of RTMM workload, DREAM reduces the overall UXCost, which is an equivalent metric of the energy-delay product (EDP) for RTMM defined in the paper, by 32.2% and 50.0% in the geometric mean (up to 80.8% and 97.6%) compared to state-of-the-art baselines, which shows the efficacy of our scheduling methodology.

Related papers

Reinforced Model Merging [53.84354455400038]
We present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks. By utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times.
arXiv Detail & Related papers (2025-03-27T08:52:41Z)
A transformer-based deep q learning approach for dynamic load balancing in software-defined networks [0.0]
This study proposes a novel approach for dynamic load balancing in Software-Defined Networks (SDNs) using a Transformer-based Deep Q-Network (DQN) Traditional load balancing mechanisms, such as Round Robin (RR) and Weighted Round Robin (WRR) are static and often struggle to adapt to fluctuating traffic conditions, leading to inefficiencies in network performance. SDNs offer centralized control and flexibility, providing an ideal platform for implementing machine learning-driven optimization strategies.
arXiv Detail & Related papers (2025-01-22T12:16:30Z)
Flow: Modularized Agentic Workflow Automation [53.073598156915615]
Multi-agent frameworks powered by large language models (LLMs) have demonstrated great success in automated planning and task execution. However, the effective adjustment of agentic during execution has not been well studied. In this paper, we define an activity-on-vertex (AOV) graph, which allows continuous workflow refinement by agents. Our proposed multi-agent framework achieves efficient concurrent execution of subtasks, effective goal achievement, and enhanced error tolerance.
arXiv Detail & Related papers (2025-01-14T04:35:37Z)
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution [114.61347672265076]
Development of MLLMs for real-world robots is challenging due to the typically limited computation and memory capacities available on robotic platforms. We propose a Dynamic Early-Exit Framework for Robotic Vision-Language-Action Model (DeeR) that automatically adjusts the size of the activated MLLM. DeeR demonstrates significant reductions in computational costs of LLM by 5.2-6.5x and GPU memory of LLM by 2-6x without compromising performance.
arXiv Detail & Related papers (2024-11-04T18:26:08Z)
Smart energy management: process structure-based hybrid neural networks for optimal scheduling and economic predictive control in integrated systems [2.723192806018494]
Integrated energy systems (IESs) are complex systems consisting of diverse operating units spanning multiple domains. We propose a physics-informed hybrid time-series neural network (NN) surrogate to predict the dynamic performance of IESs across multiple time scales.
arXiv Detail & Related papers (2024-10-07T04:24:39Z)
Spindle: Efficient Distributed Training of Multi-Task Large Models via Wavefront Scheduling [35.06717005729781]
Spindle is a new training system tailored for resource-efficient training of multi-task (MT) multi-modal (MM) models via wavefront scheduling. Experiments demonstrate the superior performance and efficiency of Spindle, with speedup ratio up to 71% compared to state-of-the-art training systems.
arXiv Detail & Related papers (2024-09-05T09:10:40Z)
SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators [12.416683044819955]
Multi-model workloads with heavy models like recent large language models significantly increased the compute and memory demands on hardware. To address such increasing demands, designing a scalable hardware architecture became a key problem. We develop a set of schedulers to navigate the huge scheduling space and codify them into a scheduler, SCAR, with advanced techniques such as inter-chiplet pipelining.
arXiv Detail & Related papers (2024-05-01T18:02:25Z)
DynaMMo: Dynamic Model Merging for Efficient Class Incremental Learning for Medical Images [0.8213829427624407]
Continual learning, the ability to acquire knowledge from new data while retaining previously learned information, is a fundamental challenge in machine learning. We propose Dynamic Model Merging, DynaMMo, a method that merges multiple networks at different stages of model training to achieve better computational efficiency. We evaluate DynaMMo on three publicly available datasets, demonstrating its effectiveness compared to existing approaches.
arXiv Detail & Related papers (2024-04-22T11:37:35Z)
Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices. We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling. Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z)
RED: A Systematic Real-Time Scheduling Approach for Robotic Environmental Dynamics [11.38746414146899]
We introduce RED, a systematic real-time scheduling approach designed to support multi-task deep neural network workloads in resource-limited robotic systems. It is designed to adaptively manage the Robotic Environmental Dynamics (RED) while adhering to real-time constraints.
arXiv Detail & Related papers (2023-08-29T15:04:08Z)
Asynchronous Multi-Model Dynamic Federated Learning over Wireless Networks: Theory, Modeling, and Optimization [20.741776617129208]
Federated learning (FL) has emerged as a key technique for distributed machine learning (ML) We first formulate rectangular scheduling steps and functions to capture the impact of system parameters on learning performance. Our analysis sheds light on the joint impact of device training variables and asynchronous scheduling decisions.
arXiv Detail & Related papers (2023-05-22T21:39:38Z)
M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design [95.41238363769892]
Multi-task learning (MTL) encapsulates multiple learned tasks in a single model and often lets those tasks learn better jointly. Current MTL regimes have to activate nearly the entire model even to just execute a single task. We present a model-accelerator co-design framework to enable efficient on-device MTL.
arXiv Detail & Related papers (2022-10-26T15:40:24Z)
Attentional Mixtures of Soft Prompt Tuning for Parameter-efficient Multi-task Knowledge Sharing [53.399742232323895]
ATTEMPT is a new modular, multi-task, and parameter-efficient language model (LM) tuning approach. It combines knowledge transferred across different tasks via a mixture of soft prompts while keeping original LM unchanged. It is parameter-efficient (e.g., updates 1,600 times fewer parameters than fine-tuning) and enables multi-task learning and flexible extensions.
arXiv Detail & Related papers (2022-05-24T10:48:33Z)
Controllable Dynamic Multi-Task Architectures [92.74372912009127]
We propose a controllable multi-task network that dynamically adjusts its architecture and weights to match the desired task preference as well as the resource constraints. We propose a disentangled training of two hypernetworks, by exploiting task affinity and a novel branching regularized loss, to take input preferences and accordingly predict tree-structured models with adapted weights.
arXiv Detail & Related papers (2022-03-28T17:56:40Z)
Real-time Neural-MPC: Deep Learning Model Predictive Control for Quadrotors and Agile Robotic Platforms [59.03426963238452]
We present Real-time Neural MPC, a framework to efficiently integrate large, complex neural network architectures as dynamics models within a model-predictive control pipeline. We show the feasibility of our framework on real-world problems by reducing the positional tracking error by up to 82% when compared to state-of-the-art MPC approaches without neural network dynamics.
arXiv Detail & Related papers (2022-03-15T09:38:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.