Perona: Robust Infrastructure Fingerprinting for Resource-Efficient Big
Data Analytics
- URL: http://arxiv.org/abs/2211.08227v1
- Date: Tue, 15 Nov 2022 15:48:09 GMT
- Title: Perona: Robust Infrastructure Fingerprinting for Resource-Efficient Big
Data Analytics
- Authors: Dominik Scheinert, Soeren Becker, Jonathan Bader, Lauritz Thamsen,
Jonathan Will, Odej Kao
- Abstract summary: We present Perona, a novel approach to robust infrastructure fingerprinting for usage in big data analytics.
Perona employs common sets and configurations of benchmarking tools for target resources, so that resulting benchmark metrics are directly comparable and ranking is enabled.
We evaluate our approach both on data gathered from our own experiments as well as within related works for resource configuration optimization.
- Score: 0.06524460254566904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Choosing a good resource configuration for big data analytics applications
can be challenging, especially in cloud environments. Automated approaches are
desirable as poor decisions can reduce performance and raise costs. The
majority of existing automated approaches either build performance models from
previous workload executions or conduct iterative resource configuration
profiling until a near-optimal solution has been found. In doing so, they only
obtain an implicit understanding of the underlying infrastructure, which is
difficult to transfer to alternative infrastructures and, thus, profiling and
modeling insights are not sustained beyond very specific situations.
We present Perona, a novel approach to robust infrastructure fingerprinting
for usage in the context of big data analytics. Perona employs common sets and
configurations of benchmarking tools for target resources, so that resulting
benchmark metrics are directly comparable and ranking is enabled. Insignificant
benchmark metrics are discarded by learning a low-dimensional representation of
the input metric vector, and previous benchmark executions are taken into
consideration for context-awareness as well, allowing to detect resource
degradation. We evaluate our approach both on data gathered from our own
experiments as well as within related works for resource configuration
optimization, demonstrating that Perona captures the characteristics from
benchmark runs in a compact manner and produces representations that can be
used directly.
Related papers
- Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment [104.18002641195442]
We introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data.
Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation.
arXiv Detail & Related papers (2024-05-31T14:21:04Z) - Building Interpretable and Reliable Open Information Retriever for New
Domains Overnight [67.03842581848299]
Information retrieval is a critical component for many down-stream tasks such as open-domain question answering (QA)
We propose an information retrieval pipeline that uses entity/event linking model and query decomposition model to focus more accurately on different information units of the query.
We show that, while being more interpretable and reliable, our proposed pipeline significantly improves passage coverages and denotation accuracies across five IR and QA benchmarks.
arXiv Detail & Related papers (2023-08-09T07:47:17Z) - OPTION: OPTImization Algorithm Benchmarking ONtology [4.060078409841919]
OPTION (OPTImization algorithm benchmarking ONtology) is a semantically rich, machine-readable data model for benchmarking platforms.
Our ontology provides the vocabulary needed for semantic annotation of the core entities involved in the benchmarking process.
It also provides means for automatic data integration, improved interoperability, and powerful querying capabilities.
arXiv Detail & Related papers (2022-11-21T10:34:43Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Optimal Resource Allocation for Serverless Queries [8.59568779761598]
Prior work focused on predicting peak allocation while ignoring aggressive trade-offs between resource allocation and run-time.
We introduce a system for optimal resource allocation that can predict performance with aggressive trade-offs, for both new and past observed queries.
arXiv Detail & Related papers (2021-07-19T02:55:48Z) - Comparative Code Structure Analysis using Deep Learning for Performance
Prediction [18.226950022938954]
This paper aims to assess the feasibility of using purely static information (e.g., abstract syntax tree or AST) of applications to predict performance change based on the change in code structure.
Our evaluations of several deep embedding learning methods demonstrate that tree-based Long Short-Term Memory (LSTM) models can leverage the hierarchical structure of source-code to discover latent representations and achieve up to 84% (individual problem) and 73% (combined dataset with multiple of problems) accuracy in predicting the change in performance.
arXiv Detail & Related papers (2021-02-12T16:59:12Z) - DAGA: Data Augmentation with a Generation Approach for Low-resource
Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences.
Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z) - A critical analysis of metrics used for measuring progress in artificial
intelligence [9.387811897655016]
We analyse the current landscape of performance metrics based on data covering 3867 machine learning model performance results.
Results suggest that the large majority of metrics currently used have properties that may result in an inadequate reflection of a models' performance.
We describe ambiguities in reported metrics, which may lead to difficulties in interpreting and comparing model performances.
arXiv Detail & Related papers (2020-08-06T11:14:37Z) - IOHanalyzer: Detailed Performance Analyses for Iterative Optimization
Heuristics [3.967483941966979]
IOHanalyzer is a new user-friendly tool for the analysis, comparison, and visualization of performance data of IOHs.
IOHanalyzer provides detailed statistics about fixed-target running times and about fixed-budget performance of the benchmarked algorithms.
IOHanalyzer can directly process performance data from the main benchmarking platforms.
arXiv Detail & Related papers (2020-07-08T08:20:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.