On the Potential of Execution Traces for Batch Processing Workload
Optimization in Public Clouds
- URL: http://arxiv.org/abs/2111.08759v1
- Date: Tue, 16 Nov 2021 20:11:36 GMT
- Title: On the Potential of Execution Traces for Batch Processing Workload
Optimization in Public Clouds
- Authors: Dominik Scheinert, Alireza Alamgiralem, Jonathan Bader, Jonathan Will,
Thorsten Wittkopp, Lauritz Thamsen
- Abstract summary: We propose a collaborative approach for sharing anonymized workload execution traces among users.
We mining them for general patterns, and exploiting clusters of historical workloads for future optimizations.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the growing amount of data, data processing workloads and the management
of their resource usage becomes increasingly important. Since managing a
dedicated infrastructure is in many situations infeasible or uneconomical,
users progressively execute their respective workloads in the cloud. As the
configuration of workloads and resources is often challenging, various methods
have been proposed that either quickly profile towards a good configuration or
determine one based on data from previous runs. Still, performance data to
train such methods is often lacking and must be costly collected.
In this paper, we propose a collaborative approach for sharing anonymized
workload execution traces among users, mining them for general patterns, and
exploiting clusters of historical workloads for future optimizations. We
evaluate our prototype implementation for mining workload execution graphs on a
publicly available trace dataset and demonstrate the predictive value of
workload clusters determined through traces only.
Related papers
- Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorFBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures.
We also present WorFEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms.
We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z) - Karasu: A Collaborative Approach to Efficient Cluster Configuration for
Big Data Analytics [3.779250782197386]
Karasu is an approach to more efficient resource configuration profiling.
It promotes data sharing among users working with similar infrastructures, frameworks, algorithms, or datasets.
We show that Karasu is able to significantly boost existing methods in terms of performance, search time, and cost.
arXiv Detail & Related papers (2023-08-22T21:14:57Z) - Optimal transfer protocol by incremental layer defrosting [66.76153955485584]
Transfer learning is a powerful tool enabling model training with limited amounts of data.
The simplest transfer learning protocol is based on freezing" the feature-extractor layers of a network pre-trained on a data-rich source task.
We show that this protocol is often sub-optimal and the largest performance gain may be achieved when smaller portions of the pre-trained network are kept frozen.
arXiv Detail & Related papers (2023-03-02T17:32:11Z) - Optimizing Data Collection for Machine Learning [87.37252958806856]
Modern deep learning systems require huge data sets to achieve impressive performance.
Over-collecting data incurs unnecessary present costs, while under-collecting may incur future costs and delay.
We propose a new paradigm for modeling the data collection as a formal optimal data collection problem.
arXiv Detail & Related papers (2022-10-03T21:19:05Z) - Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data
Programming [77.38174112525168]
We present Nemo, an end-to-end interactive Supervision system that improves overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS supervision approach.
arXiv Detail & Related papers (2022-03-02T19:57:32Z) - Enel: Context-Aware Dynamic Scaling of Distributed Dataflow Jobs using
Graph Propagation [52.9168275057997]
This paper presents Enel, a novel dynamic scaling approach that uses message propagation on an attributed graph to model dataflow jobs.
We show that Enel is able to identify effective rescaling actions, reacting for instance to node failures, and can be reused across different execution contexts.
arXiv Detail & Related papers (2021-08-27T10:21:08Z) - Evaluation of Load Prediction Techniques for Distributed Stream
Processing [0.0]
Distributed Stream Processing (DSP) systems enable processing large streams of continuous data to produce results in near to real time.
The rate at which events arrive at DSP systems can vary considerably over time.
A priori knowledge of incoming workloads enables proactive approaches to resource management and optimization.
arXiv Detail & Related papers (2021-08-10T15:25:32Z) - Bellamy: Reusing Performance Models for Distributed Dataflow Jobs Across
Contexts [52.9168275057997]
This paper presents Bellamy, a novel modeling approach that combines scale-outs, dataset sizes, and runtimes with additional descriptive properties of a dataflow job.
We evaluate our approach on two publicly available datasets consisting of execution data from various dataflow jobs carried out in different environments.
arXiv Detail & Related papers (2021-07-29T11:57:38Z) - Optimal Resource Allocation for Serverless Queries [8.59568779761598]
Prior work focused on predicting peak allocation while ignoring aggressive trade-offs between resource allocation and run-time.
We introduce a system for optimal resource allocation that can predict performance with aggressive trade-offs, for both new and past observed queries.
arXiv Detail & Related papers (2021-07-19T02:55:48Z) - Sequence-to-sequence models for workload interference [1.988145627448243]
Co-scheduling of jobs in data-centers is a challenging scenario, where jobs can compete for resources yielding to severe slowdowns or failed executions.
Current techniques, most of them already involving machine learning and job modeling, are based on workload behavior summarization across time.
We propose a methodology for modeling co-scheduling of jobs on data-centers, based on their behavior towards resources and execution time.
arXiv Detail & Related papers (2020-06-25T14:11:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.