Related papers: Domain-specific Genetic Algorithm for Multi-tenant DNNAccelerator Scheduling

Domain-specific Genetic Algorithm for Multi-tenant DNNAccelerator Scheduling

URL: http://arxiv.org/abs/2104.13997v2
Date: Fri, 30 Apr 2021 14:41:36 GMT
Title: Domain-specific Genetic Algorithm for Multi-tenant DNNAccelerator Scheduling
Authors: Sheng-Chun Kao, Tushar Krishna
Abstract summary: There is a growing trend towards building large accelerators with several sub-accelerator cores/chiplets. This work looks at the problem of supporting multi-tenancy on such accelerators. We develop a specialized genetic algorithm called G# withcustom operators to enable structured sample-efficient exploration.
Score: 3.8530020696501794
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As Deep Learning continues to drive a variety of applications in datacenters and HPC, there is a growing trend towards building large accelerators with several sub-accelerator cores/chiplets. This work looks at the problem of supporting multi-tenancy on such accelerators. In particular, we focus on the problem of mapping layers from several DNNs simultaneously on an accelerator. Given the extremely large search space, we formulate the search as an optimization problem and develop a specialized genetic algorithm called G# withcustom operators to enable structured sample-efficient exploration. We quantitatively compare G# with several common heuristics, state-of-the-art optimization methods, and reinforcement learning methods across different accelerator set-tings (large/small accelerators) and different sub-accelerator configurations (homogeneous/heterogeneous), and observeG# can consistently find better solutions. Further, to enable real-time scheduling, we also demonstrate a method to generalize the learnt schedules and transfer them to the next batch of jobs, reducing schedule compute time to near zero.

Related papers

Speeding up Local Optimization in Vehicle Routing with Tensor-based GPU Acceleration [23.87172157992149]
We introduce an original tensor-based GPU acceleration method to speed up the commonly used local search operators in vehicle routing.<n>Its low-coupling architecture, with intensive computations completely offloaded to the GPU, ensures seamless integration in various local search-based algorithms and frameworks.
arXiv Detail & Related papers (2025-06-20T07:40:47Z)
Accelerating Vehicle Routing via AI-Initialized Genetic Algorithms [55.78505925402658]
Vehicle Routing Problems (VRP) are an extension of the Traveling Salesperson Problem and are a fundamental NP-hard challenge in Evolutionary optimization. We introduce a novel optimization framework that uses a reinforcement learning agent - trained on prior instances - to quickly generate initial solutions, which are then further optimized by genetic algorithms. For example, EARLI handles vehicle routing with 500 locations within 1s, 10x faster than current solvers for the same solution quality, enabling applications like real-time and interactive routing.
arXiv Detail & Related papers (2025-04-08T15:21:01Z)
A Multiagent Path Search Algorithm for Large-Scale Coalition Structure Generation [61.08720171136229]
Coalition structure generation is a fundamental computational problem in multiagent systems. We develop SALDAE, a multiagent path finding algorithm for CSG that operates on a graph of coalition structures.
arXiv Detail & Related papers (2025-02-14T15:21:27Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
AcceleratedLiNGAM: Learning Causal DAGs at the speed of GPUs [57.12929098407975]
We show that by efficiently parallelizing existing causal discovery methods, we can scale them to thousands of dimensions. Specifically, we focus on the causal ordering subprocedure in DirectLiNGAM and implement GPU kernels to accelerate it. This allows us to apply DirectLiNGAM to causal inference on large-scale gene expression data with genetic interventions yielding competitive results.
arXiv Detail & Related papers (2024-03-06T15:06:11Z)
Teal: Learning-Accelerated Optimization of WAN Traffic Engineering [68.7863363109948]
We present Teal, a learning-based TE algorithm that leverages the parallel processing power of GPUs to accelerate TE control. To reduce the problem scale and make learning tractable, Teal employs a multi-agent reinforcement learning (RL) algorithm to independently allocate each traffic demand. Compared with other TE acceleration schemes, Teal satisfies 6--32% more traffic demand and yields 197--625x speedups.
arXiv Detail & Related papers (2022-10-25T04:46:30Z)
Demystifying Map Space Exploration for NPUs [4.817475305740601]
Map Space Exploration is the problem of finding optimized mappings of a Deep Neural Network (DNN) model. We do a first-of-its-kind apples-to-apples comparison of search techniques leveraged by different mappers. Next, we propose two new techniques that can augment existing mappers.
arXiv Detail & Related papers (2022-10-07T17:58:45Z)
Flipping the switch on local exploration: Genetic Algorithms with Reversals [0.0]
Authors show that gradient-free search techniques are suitable for providing an optimal solution in the discrete domain. They also show that the use of multiple local searches can improve performance on local searches. It is observed that the proposed GA variants have the least average cost across all benchmarks including the problem proposed and IC performs better than its constituents.
arXiv Detail & Related papers (2022-02-02T08:27:11Z)
Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators [4.055002321981825]
We present a HW-SW co-design ecosystem for spatial accelerators called Union. Our framework allows exploring different algorithms and their mappings on several accelerator cost models. We demonstrate the value of Union for the community with several case studies.
arXiv Detail & Related papers (2021-09-15T16:42:18Z)
Multi-task Over-the-Air Federated Learning: A Non-Orthogonal Transmission Approach [52.85647632037537]
We propose a multi-task over-theair federated learning (MOAFL) framework, where multiple learning tasks share edge devices for data collection and learning models under the coordination of a edge server (ES) Both the convergence analysis and numerical results demonstrate that the MOAFL framework can significantly reduce the uplink bandwidth consumption of multiple tasks without causing substantial learning performance degradation.
arXiv Detail & Related papers (2021-06-27T13:09:32Z)
CoSA: Scheduling by Constrained Optimization for Spatial Accelerators [1.9149970150912705]
We present CoSA, a constrained-optimization-based approach for scheduling Deep Neural Networks (DNNs) accelerators. As opposed to existing approaches that either rely on designers's or iterative methods to navigate the search space, CoSA expresses scheduling decisions as a constrained-optimization problem. We demonstrate that CoSA-generated schedules significantly outperform state-of-the-art approaches by a geometric mean of up to 2.5x.
arXiv Detail & Related papers (2021-05-05T07:17:25Z)
The Programming of Deep Learning Accelerators as a Constraint Satisfaction Problem [0.0]
We propose a new approach to implementing operators efficiently with complex instructions such as matrix multiply. By formulating the embedding as a constraint satisfaction problem over the scalar dataflow, every possible embedding solution is contained in the search space. A detailed evaluation using the VTA hardware accelerator with the Baidu DeepBench inference benchmark suite shows that our approach can automatically generate code competitive to reference implementations.
arXiv Detail & Related papers (2021-04-10T10:39:47Z)
Gradient Coding with Dynamic Clustering for Straggler-Tolerant Distributed Learning [55.052517095437]
gradient descent (GD) is widely employed to parallelize the learning task by distributing the dataset across multiple workers. A significant performance bottleneck for the per-iteration completion time in distributed synchronous GD is $straggling$ workers. Coded distributed techniques have been introduced recently to mitigate stragglers and to speed up GD iterations by assigning redundant computations to workers. We propose a novel dynamic GC scheme, which assigns redundant data to workers to acquire the flexibility to choose from among a set of possible codes depending on the past straggling behavior.
arXiv Detail & Related papers (2021-03-01T18:51:29Z)
CATCH: Context-based Meta Reinforcement Learning for Transferrable Architecture Search [102.67142711824748]
CATCH is a novel Context-bAsed meTa reinforcement learning algorithm for transferrable arChitecture searcH. The combination of meta-learning and RL allows CATCH to efficiently adapt to new tasks while being agnostic to search spaces. It is also capable of handling cross-domain architecture search as competitive networks on ImageNet, COCO, and Cityscapes are identified.
arXiv Detail & Related papers (2020-07-18T09:35:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.