Sequence-to-sequence models for workload interference
- URL: http://arxiv.org/abs/2006.14429v2
- Date: Mon, 6 Jul 2020 14:23:49 GMT
- Title: Sequence-to-sequence models for workload interference
- Authors: David Buchaca Prats, Joan Marcual, Josep Llu\'is Berral, David Carrera
- Abstract summary: Co-scheduling of jobs in data-centers is a challenging scenario, where jobs can compete for resources yielding to severe slowdowns or failed executions.
Current techniques, most of them already involving machine learning and job modeling, are based on workload behavior summarization across time.
We propose a methodology for modeling co-scheduling of jobs on data-centers, based on their behavior towards resources and execution time.
- Score: 1.988145627448243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Co-scheduling of jobs in data-centers is a challenging scenario, where jobs
can compete for resources yielding to severe slowdowns or failed executions.
Efficient job placement on environments where resources are shared requires
awareness on how jobs interfere during execution, to go far beyond ineffective
resource overbooking techniques. Current techniques, most of them already
involving machine learning and job modeling, are based on workload behavior
summarization across time, instead of focusing on effective job requirements at
each instant of the execution. In this work we propose a methodology for
modeling co-scheduling of jobs on data-centers, based on their behavior towards
resources and execution time, using sequence-to-sequence models based on
recurrent neural networks. The goal is to forecast co-executed jobs footprint
on resources along their execution time, from the profile shown by the
individual jobs, to enhance resource managers and schedulers placement
decisions. The methods here presented are validated using High Performance
Computing benchmarks based on different frameworks (like Hadoop and Spark) and
applications (CPU bound, IO bound, machine learning, SQL queries...).
Experiments show that the model can correctly identify the resource usage
trends from previously seen and even unseen co-scheduled jobs.
Related papers
- Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorFBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures.
We also present WorFEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms.
We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z) - A Memetic Algorithm with Reinforcement Learning for Sociotechnical
Production Scheduling [0.0]
This article presents a memetic algorithm with applying deep reinforcement learning (DRL) to flexible job shop scheduling problems (DRC-FJSSP)
From research projects in industry, we recognize the need to consider flexible machines, flexible human workers, worker capabilities, setup and processing operations, material arrival times, complex job paths with parallel tasks for bill of material manufacturing, sequence-dependent setup times and (partially) automated tasks in human-machine-collaboration.
arXiv Detail & Related papers (2022-12-21T11:24:32Z) - Asynchronous Execution of Heterogeneous Tasks in ML-driven HPC Workflows [1.376408511310322]
Asynchronous execution is crucial to improve resource utilization, task throughput and reduce' makespan.
We investigate the requirements and properties of the asynchronous task execution of machine learning (ML)-driven high performance computing.
Our experiments represent relevant scientific drivers, we perform them at scale on Summit, and we show that the performance enhancements due to asynchronous execution are consistent with our model.
arXiv Detail & Related papers (2022-08-23T16:25:48Z) - Concepts and Algorithms for Agent-based Decentralized and Integrated
Scheduling of Production and Auxiliary Processes [78.120734120667]
This paper describes an agent-based decentralized and integrated scheduling approach.
Part of the requirements is to develop a linearly scaling communication architecture.
The approach is explained using an example based on industrial requirements.
arXiv Detail & Related papers (2022-05-06T18:44:29Z) - Active Multi-Task Representation Learning [50.13453053304159]
We give the first formal study on resource task sampling by leveraging the techniques from active learning.
We propose an algorithm that iteratively estimates the relevance of each source task to the target task and samples from each source task based on the estimated relevance.
arXiv Detail & Related papers (2022-02-02T08:23:24Z) - On the Potential of Execution Traces for Batch Processing Workload
Optimization in Public Clouds [0.0]
We propose a collaborative approach for sharing anonymized workload execution traces among users.
We mining them for general patterns, and exploiting clusters of historical workloads for future optimizations.
arXiv Detail & Related papers (2021-11-16T20:11:36Z) - Enel: Context-Aware Dynamic Scaling of Distributed Dataflow Jobs using
Graph Propagation [52.9168275057997]
This paper presents Enel, a novel dynamic scaling approach that uses message propagation on an attributed graph to model dataflow jobs.
We show that Enel is able to identify effective rescaling actions, reacting for instance to node failures, and can be reused across different execution contexts.
arXiv Detail & Related papers (2021-08-27T10:21:08Z) - Bellamy: Reusing Performance Models for Distributed Dataflow Jobs Across
Contexts [52.9168275057997]
This paper presents Bellamy, a novel modeling approach that combines scale-outs, dataset sizes, and runtimes with additional descriptive properties of a dataflow job.
We evaluate our approach on two publicly available datasets consisting of execution data from various dataflow jobs carried out in different environments.
arXiv Detail & Related papers (2021-07-29T11:57:38Z) - Exploring Relational Context for Multi-Task Dense Prediction [76.86090370115]
We consider a multi-task environment for dense prediction tasks, represented by a common backbone and independent task-specific heads.
We explore various attention-based contexts, such as global and local, in the multi-task setting.
We propose an Adaptive Task-Relational Context module, which samples the pool of all available contexts for each task pair.
arXiv Detail & Related papers (2021-04-28T16:45:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.