Related papers: Unicorn: Reasoning about Configurable System Performance through the lens of Causality

Unicorn: Reasoning about Configurable System Performance through the lens of Causality

URL: http://arxiv.org/abs/2201.08413v1
Date: Thu, 20 Jan 2022 19:16:50 GMT
Title: Unicorn: Reasoning about Configurable System Performance through the lens of Causality
Authors: Md Shahriar Iqbal, Rahul Krishna, Mohammad Ali Javidian, Baishakhi Ray, Pooyan Jamshidi
Abstract summary: We propose a new method, called Unicorn, which captures intricate interactions between configuration options across the software- hardware stack. Experiments indicate that Unicorn outperforms state-of-the-art performance optimization and debug methods. Unlike the existing methods, the learned causal performance models reliably predict performance for new environments.
Score: 12.877523121932114
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern computer systems are highly configurable, with the variability space sometimes larger than the number of atoms in the universe. Understanding and reasoning about the performance behavior of highly configurable systems, due to a vast variability space, is challenging. State-of-the-art methods for performance modeling and analyses rely on predictive machine learning models, therefore, they become (i) unreliable in unseen environments (e.g., different hardware, workloads), and (ii) produce incorrect explanations. To this end, we propose a new method, called Unicorn, which (a) captures intricate interactions between configuration options across the software-hardware stack and (b) describes how such interactions impact performance variations via causal inference. We evaluated Unicorn on six highly configurable systems, including three on-device machine learning systems, a video encoder, a database management system, and a data analytics pipeline. The experimental results indicate that Unicorn outperforms state-of-the-art performance optimization and debugging methods. Furthermore, unlike the existing methods, the learned causal performance models reliably predict performance for new environments.

Related papers

Leveraging Neural Graph Compilers in Machine Learning Research for Edge-Cloud Systems [5.241450170761232]
This work presents a comprehensive evaluation of neural network graph compilers across heterogeneous hardware platforms. Our systematic analysis reveals that graph compilers exhibit performance patterns highly dependent on both neural architecture and batch sizes. We introduce novel metrics to quantify a compiler's ability to mitigate performance friction as batch size increases.
arXiv Detail & Related papers (2025-04-28T19:02:16Z)
Can foundation models actively gather information in interactive environments to test hypotheses? [56.651636971591536]
We introduce a framework in which a model must determine the factors influencing a hidden reward function. We investigate whether approaches such as self- throughput and increased inference time improve information gathering efficiency.
arXiv Detail & Related papers (2024-12-09T12:27:21Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts. We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z)
Third-Party Language Model Performance Prediction from Instruction [59.574169249307054]
Language model-based instruction-following systems have lately shown increasing performance on many benchmark tasks. A user may easily prompt a model with an instruction without any idea of whether the responses should be expected to be accurate. We propose a third party performance prediction framework, where a separate model is trained to predict the metric resulting from evaluating an instruction-following system on a task.
arXiv Detail & Related papers (2024-03-19T03:53:47Z)
Multi-Objective Optimization of Performance and Interpretability of Tabular Supervised Machine Learning Models [0.9023847175654603]
Interpretability is quantified via three measures: feature sparsity, interaction sparsity of features, and sparsity of non-monotone feature effects. We show that our framework is capable of finding diverse models that are highly competitive or outperform state-of-the-art XGBoost or Explainable Boosting Machine models.
arXiv Detail & Related papers (2023-07-17T00:07:52Z)
CAMEO: A Causal Transfer Learning Approach for Performance Optimization of Configurable Computer Systems [16.75106122540052]
We propose CAMEO, a method that identifies invariant causal predictors under environmental changes. We demonstrate significant performance improvements over state-of-the-art optimization methods in MLperf deep learning systems, a video analytics pipeline, and a database system.
arXiv Detail & Related papers (2023-06-13T16:28:37Z)
HINNPerf: Hierarchical Interaction Neural Network for Performance Prediction of Configurable Systems [22.380061796355616]
HINNPerf is a novel hierarchical interaction neural network for performance prediction. HINNPerf employs the embedding method and hierarchic network blocks to model the complicated interplay between configuration options. Empirical results on 10 real-world systems show that our method statistically significantly outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-04-08T08:52:23Z)
An Empirical Analysis of Backward Compatibility in Machine Learning Systems [47.04803977692586]
We consider how updates, intended to improve ML models, can introduce new errors that can significantly affect downstream systems and users. For example, updates in models used in cloud-based classification services, such as image recognition, can cause unexpected erroneous behavior.
arXiv Detail & Related papers (2020-08-11T08:10:58Z)
Learning to Simulate Complex Physics with Graph Networks [68.43901833812448]
We present a machine learning framework and model implementation that can learn to simulate a wide variety of challenging physical domains. Our framework---which we term "Graph Network-based Simulators" (GNS)--represents the state of a physical system with particles, expressed as nodes in a graph, and computes dynamics via learned message-passing. Our results show that our model can generalize from single-timestep predictions with thousands of particles during training, to different initial conditions, thousands of timesteps, and at least an order of magnitude more particles at test time.
arXiv Detail & Related papers (2020-02-21T16:44:28Z)
Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments. We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data. Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.