Related papers: Neural-based Modeling for Performance Tuning of Spark Data Analytics

Neural-based Modeling for Performance Tuning of Spark Data Analytics

URL: http://arxiv.org/abs/2101.08167v1
Date: Wed, 20 Jan 2021 14:58:55 GMT
Title: Neural-based Modeling for Performance Tuning of Spark Data Analytics
Authors: Khaled Zaouk, Fei Song, Chenghao Lyu and Yanlei Diao
Abstract summary: Performance modeling of cloud data analytics is crucial for performance tuning and other critical operations in the cloud. Recent Deep Learning techniques bear on the process of automated performance modeling of cloud data analytics. Our work provides an in-depth study of different modeling choices that suit our requirements.
Score: 1.2251128138369254
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cloud data analytics has become an integral part of enterprise business operations for data-driven insight discovery. Performance modeling of cloud data analytics is crucial for performance tuning and other critical operations in the cloud. Traditional modeling techniques fail to adapt to the high degree of diversity in workloads and system behaviors in this domain. In this paper, we bring recent Deep Learning techniques to bear on the process of automated performance modeling of cloud data analytics, with a focus on Spark data analytics as representative workloads. At the core of our work is the notion of learning workload embeddings (with a set of desired properties) to represent fundamental computational characteristics of different jobs, which enable performance prediction when used together with job configurations that control resource allocation and other system knobs. Our work provides an in-depth study of different modeling choices that suit our requirements. Results of extensive experiments reveal the strengths and limitations of different modeling methods, as well as superior performance of our best performing method over a state-of-the-art modeling tool for cloud analytics.

Related papers

SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation [81.36747103102459]
Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications. Current state-of-the-art methods focus on training innovative architectural designs on confined datasets. We investigate the impact of scaling up EHPS towards a family of generalist foundation models.
arXiv Detail & Related papers (2025-01-16T18:59:46Z)
On Foundation Models for Dynamical Systems from Purely Synthetic Data [5.004576576202551]
Foundation models have demonstrated remarkable generalization, data efficiency, and robustness properties across various domains. These models are available in fields like natural language processing and computer vision, but do not exist for dynamical systems. We address this challenge by pretraining a transformer-based foundation model exclusively on synthetic data. Our results demonstrate the feasibility of foundation models for dynamical systems that outperform specialist models in terms of generalization, data efficiency, and robustness.
arXiv Detail & Related papers (2024-11-30T08:34:10Z)
Theoretical Analysis of Learned Database Operations under Distribution Shift through Distribution Learnability [0.8594140167290097]
We present the first known theoretical characterization of the performance of learned models in dynamic datasets. Our results show novel theoretical characteristics achievable by learned models and provide bounds on the performance of the models. Our analysis develops the distribution learnability framework and novel theoretical tools which build the foundation for the analysis of learned database operations in the future.
arXiv Detail & Related papers (2024-11-09T17:47:05Z)
Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization. We introduce a benchmark comprising eight different synthetic and real-world datasets. We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z)
iNNspector: Visual, Interactive Deep Model Debugging [8.997568393450768]
We propose a conceptual framework structuring the data space of deep learning experiments. Our framework captures design dimensions and proposes mechanisms to make this data explorable and tractable. We present the iNNspector system, which enables tracking of deep learning experiments and provides interactive visualizations of the data.
arXiv Detail & Related papers (2024-07-25T12:48:41Z)
The Importance of Model Inspection for Better Understanding Performance Characteristics of Graph Neural Networks [15.569758991934934]
We investigate the effect of modelling choices on the feature learning characteristics of graph neural networks applied to a brain shape classification task. We find substantial differences in the feature embeddings at different layers of the models.
arXiv Detail & Related papers (2024-05-02T13:26:18Z)
Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach [25.927323251675386]
We leverage the meta-features associated with each entity as a source of worldly knowledge and employ entity representations from the models. We propose using the consistency between these representations and the meta-features as a metric for evaluating pre-trained models. Our method's effectiveness is demonstrated across various domains, including models with relational datasets, large language models and image models.
arXiv Detail & Related papers (2024-01-02T17:08:26Z)
Variational Exploration Module VEM: A Cloud-Native Optimization and Validation Tool for Geospatial Modeling and AI Workflows [0.0]
Cloud-based deployments help to scale up these modeling and AI. We have developed the Variational Exploration Module which facilitates the optimization and validation of modeling deployed in the cloud. The flexibility and robustness of the model-agnostic module is demonstrated using real-world applications.
arXiv Detail & Related papers (2023-11-26T23:07:00Z)
Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other. We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z)
Learning Objective-Specific Active Learning Strategies with Attentive Neural Processes [72.75421975804132]
Learning Active Learning (LAL) suggests to learn the active learning strategy itself, allowing it to adapt to the given setting. We propose a novel LAL method for classification that exploits symmetry and independence properties of the active learning problem. Our approach is based on learning from a myopic oracle, which gives our model the ability to adapt to non-standard objectives.
arXiv Detail & Related papers (2023-09-11T14:16:37Z)
Distributed intelligence on the Edge-to-Cloud Continuum: A systematic literature review [62.997667081978825]
This review aims at providing a comprehensive vision of the main state-of-the-art libraries and frameworks for machine learning and data analytics available today. The main simulation, emulation, deployment systems, and testbeds for experimental research on the Edge-to-Cloud Continuum available today are also surveyed.
arXiv Detail & Related papers (2022-04-29T08:06:05Z)
Bellamy: Reusing Performance Models for Distributed Dataflow Jobs Across Contexts [52.9168275057997]
This paper presents Bellamy, a novel modeling approach that combines scale-outs, dataset sizes, and runtimes with additional descriptive properties of a dataflow job. We evaluate our approach on two publicly available datasets consisting of execution data from various dataflow jobs carried out in different environments.
arXiv Detail & Related papers (2021-07-29T11:57:38Z)
How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance. We formulate a quality measure for the data set, which we refer to as $rho$-gap. We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.