Related papers: Application Performance Modeling via Tensor Completion

Application Performance Modeling via Tensor Completion

URL: http://arxiv.org/abs/2210.10184v3
Date: Tue, 29 Aug 2023 14:36:29 GMT
Title: Application Performance Modeling via Tensor Completion
Authors: Edward Hutter and Edgar Solomonik
Abstract summary: We show that low-rank canonical-polyadic (CP) tensor decomposition is effective in approximating these tensors. We then employ tensor completion to optimize a CP decomposition given a sparse set of observed execution times.
Score: 6.399089940376445
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Performance tuning, software/hardware co-design, and job scheduling are among the many tasks that rely on models to predict application performance. We propose and evaluate low-rank tensor decomposition for modeling application performance. We discretize the input and configuration domains of an application using regular grids. Application execution times mapped within grid-cells are averaged and represented by tensor elements. We show that low-rank canonical-polyadic (CP) tensor decomposition is effective in approximating these tensors. We further show that this decomposition enables accurate extrapolation of unobserved regions of an application's parameter space. We then employ tensor completion to optimize a CP decomposition given a sparse set of observed execution times. We consider alternative piecewise/grid-based models and supervised learning models for six applications and demonstrate that CP decomposition optimized using tensor completion offers higher prediction accuracy and memory-efficiency for high-dimensional performance modeling.

Related papers

When Bayesian Tensor Completion Meets Multioutput Gaussian Processes: Functional Universality and Rank Learning [53.17227599983122]
Functional tensor decomposition can analyze multi-dimensional data with real-valued indices.<n>We propose a rank-revealing functional low-rank tensor completion (RR-F) method.<n>We establish the universal approximation property of the model for continuous multi-dimensional signals.
arXiv Detail & Related papers (2025-12-25T03:15:52Z)
Score-Based Model for Low-Rank Tensor Recovery [49.158601255093416]
Low-rank tensor decompositions (TDs) provide an effective framework for multiway data analysis.<n>Traditional TD methods rely on predefined structural assumptions, such as CP or Tucker decompositions.<n>We propose a score-based model that eliminates the need for predefined structural or distributional assumptions.
arXiv Detail & Related papers (2025-06-27T15:05:37Z)
Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z)
Combining Local Symmetry Exploitation and Reinforcement Learning for Optimised Probabilistic Inference -- A Work In Progress [2.2164989053903805]
Efficient probabilistic inference by variable elimination in graphical models requires an optimal elimination order. We adapt a reinforcement learning approach to find efficient contraction orders in tensor networks. We show that leveraging specific structures during inference allows for introducing compact encodings of intermediate results.
arXiv Detail & Related papers (2025-03-11T18:00:23Z)
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think [53.2706196341054]
We show that the perceived inefficiency was caused by a flaw in the inference pipeline that has so far gone unnoticed. We perform end-to-end fine-tuning on top of the single-step model with task-specific losses and get a deterministic model that outperforms all other diffusion-based depth and normal estimation models.
arXiv Detail & Related papers (2024-09-17T16:58:52Z)
Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling. Our research explores task-specific model pruning to inform decisions about designing SMoE architectures. We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z)
Efficient Nonparametric Tensor Decomposition for Binary and Count Data [27.02813234958821]
We propose ENTED, an underlineEfficient underlineNon underlineTEnsor underlineDecomposition for binary and count tensors.
arXiv Detail & Related papers (2024-01-15T14:27:03Z)
Streaming Generalized Canonical Polyadic Tensor Decompositions [0.0]
We develop a method which we call OnlineGCP for computing the Generalized Canonical Polyadic (GCP) tensor decomposition of streaming data. In the streaming case, tensor data is gradually observed over time and the algorithm must incrementally update a GCP factorization with limited access to prior data.
arXiv Detail & Related papers (2021-10-27T15:26:24Z)
Using Graph Neural Networks to model the performance of Deep Neural Networks [2.1151356984322307]
We develop a novel performance model that adopts a graph representation. Experimental evaluation shows a 7:75x and 12x reduction in prediction error compared to the Halide and TVM models, respectively.
arXiv Detail & Related papers (2021-08-27T20:20:17Z)
Layer Pruning on Demand with Intermediate CTC [50.509073206630994]
We present a training and pruning method for ASR based on the connectionist temporal classification (CTC) We show that a Transformer-CTC model can be pruned in various depth on demand, improving real-time factor from 0.005 to 0.002 on GPU.
arXiv Detail & Related papers (2021-06-17T02:40:18Z)
Goal-directed Generation of Discrete Structures with Conditional Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward. We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z)
Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
Graph Prolongation Convolutional Networks: Explicitly Multiscale Machine Learning on Graphs with Applications to Modeling of Cytoskeleton [0.0]
We define a novel type of ensemble Graph Convolutional Network (GCN) model. Using optimized linear projection operators to map between spatial scales of graph, this ensemble model learns to aggregate information from each scale for its final prediction.
arXiv Detail & Related papers (2020-02-14T01:56:17Z)
Supervised Learning for Non-Sequential Data: A Canonical Polyadic Decomposition Approach [85.12934750565971]
Efficient modelling of feature interactions underpins supervised learning for non-sequential tasks. To alleviate this issue, it has been proposed to implicitly represent the model parameters as a tensor. For enhanced expressiveness, we generalize the framework to allow feature mapping to arbitrarily high-dimensional feature vectors.
arXiv Detail & Related papers (2020-01-27T22:38:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.