Related papers: DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs

DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs

URL: http://arxiv.org/abs/2505.23131v1
Date: Thu, 29 May 2025 06:04:32 GMT
Title: DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs
Authors: Xinyu Yao, Daniel Bourgeois, Abhinav Jain, Yuxin Tang, Jiawen Yao, Zhimin Ding, Arlei Silva, Chris Jermaine,
Abstract summary: We study the problem of assigning operations in a dataflow graph to devices to minimize execution time in a work-conserving system.<n>Our experiments show that textscDoppler outperforms all baseline methods across tasks.
Score: 11.966335602618933
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the problem of assigning operations in a dataflow graph to devices to minimize execution time in a work-conserving system, with emphasis on complex machine learning workloads. Prior learning-based methods often struggle due to three key limitations: (1) reliance on bulk-synchronous systems like TensorFlow, which under-utilize devices due to barrier synchronization; (2) lack of awareness of the scheduling mechanism of underlying systems when designing learning-based methods; and (3) exclusive dependence on reinforcement learning, ignoring the structure of effective heuristics designed by experts. In this paper, we propose \textsc{Doppler}, a three-stage framework for training dual-policy networks consisting of 1) a $\mathsf{SEL}$ policy for selecting operations and 2) a $\mathsf{PLC}$ policy for placing chosen operations on devices. Our experiments show that \textsc{Doppler} outperforms all baseline methods across tasks by reducing system execution time and additionally demonstrates sampling efficiency by reducing per-episode training time.

Related papers

Resource Utilization Optimized Federated Learning [19.564340315424413]
Federated learning (FL) systems facilitate distributed machine learning across a server and multiple devices.<n>This paper introduces FedOptima, a resource-optimized FL system designed to simultaneously minimize both types of idle time.
arXiv Detail & Related papers (2025-03-10T20:23:39Z)
Digital Twin-Assisted Federated Learning with Blockchain in Multi-tier Computing Systems [67.14406100332671]
In Industry 4.0 systems, resource-constrained edge devices engage in frequent data interactions. This paper proposes a digital twin (DT) and federated digital twin (FL) scheme. The efficacy of our proposed cooperative interference-based FL process has been verified through numerical analysis.
arXiv Detail & Related papers (2024-11-04T17:48:02Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
Energy-Efficient Computation with DVFS using Deep Reinforcement Learning for Multi-Task Systems in Edge Computing [6.447135136911933]
This research studies generalized systems with multi-task, multi-deadline scenarios with reinforcement learning-based DVFS.<n>The method encodes time series data in the Linux kernel into information that is easy to interpret for reinforcement learning.<n>Based on the test results, our method could save 3%-10% power compared to Linux built-in governors.
arXiv Detail & Related papers (2024-09-28T18:44:39Z)
Scheduling and Aggregation Design for Asynchronous Federated Learning over Wireless Networks [56.91063444859008]
Federated Learning (FL) is a collaborative machine learning framework that combines on-device training and server-based aggregation. We propose an asynchronous FL design with periodic aggregation to tackle the straggler issue in FL systems. We show that an age-aware'' aggregation weighting design can significantly improve the learning performance in an asynchronous FL setting.
arXiv Detail & Related papers (2022-12-14T17:33:01Z)
Semi-supervised Learning of Partial Differential Operators and Dynamical Flows [68.77595310155365]
We present a novel method that combines a hyper-network solver with a Fourier Neural Operator architecture. We test our method on various time evolution PDEs, including nonlinear fluid flows in one, two, and three spatial dimensions. The results show that the new method improves the learning accuracy at the time point of supervision point, and is able to interpolate and the solutions to any intermediate time.
arXiv Detail & Related papers (2022-07-28T19:59:14Z)
Efficient Device Scheduling with Multi-Job Federated Learning [64.21733164243781]
We propose a novel multi-job Federated Learning framework to enable the parallel training process of multiple jobs. We propose a reinforcement learning-based method and a Bayesian optimization-based method to schedule devices for multiple jobs while minimizing the cost. Our proposed approaches significantly outperform baseline approaches in terms of training time (up to 8.67 times faster) and accuracy (up to 44.6% higher)
arXiv Detail & Related papers (2021-12-11T08:05:11Z)
Dynamic Network-Assisted D2D-Aided Coded Distributed Learning [59.29409589861241]
We propose a novel device-to-device (D2D)-aided coded federated learning method (D2D-CFL) for load balancing across devices. We derive an optimal compression rate for achieving minimum processing time and establish its connection with the convergence time. Our proposed method is beneficial for real-time collaborative applications, where the users continuously generate training data.
arXiv Detail & Related papers (2021-11-26T18:44:59Z)
A Scalable and Reproducible System-on-Chip Simulation for Reinforcement Learning [0.0]
This paper proffers gym-ds3, a scalable and reproducible open environment tailored for a high-fidelity Domain-Specific System-on-Chip (DSSoC) application. The simulation corroborates to schedule hierarchical jobs onto heterogeneous System-on-Chip (SoC) processors and bridges the system to reinforcement learning research.
arXiv Detail & Related papers (2021-04-27T13:46:57Z)
Network Support for High-performance Distributed Machine Learning [17.919773898228716]
We propose a system model that captures both learning nodes (that perform computations) and information nodes (that provide data) We then formulate the problem of selecting (i) which learning and information nodes should cooperate to complete the learning task, and (ii) the number of iterations to perform. We devise an algorithm, named DoubleClimb, that can find a 1+1/|I|-competitive solution with cubic worst-case complexity.
arXiv Detail & Related papers (2021-02-05T19:38:57Z)
Optimal Task Assignment to Heterogeneous Federated Learning Devices [0.0]
We investigate the problem of minimizing the duration of Federated Learning rounds by controlling how much data each device uses for training. We propose a make-time algorithm named OLAR and prove that it provides optimal schedules. Our results indicate that OLAR provides optimal solutions with a small execution time.
arXiv Detail & Related papers (2020-10-01T07:58:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.