Performance Embeddings: A Similarity-based Approach to Automatic
Performance Optimization
- URL: http://arxiv.org/abs/2303.08142v1
- Date: Tue, 14 Mar 2023 15:51:35 GMT
- Title: Performance Embeddings: A Similarity-based Approach to Automatic
Performance Optimization
- Authors: Lukas Tr\"umper, Tal Ben-Nun, Philipp Schaad, Alexandru Calotoiu,
Torsten Hoefler
- Abstract summary: Performance embeddings enable knowledge transfer of performance tuning between applications.
We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils.
- Score: 71.69092462147292
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Performance optimization is an increasingly challenging but often repetitive
task. While each platform has its quirks, the underlying code transformations
rely on data movement and computational characteristics that recur across
applications. This paper proposes to leverage those similarities by
constructing an embedding space for subprograms. The continuous space captures
both static and dynamic properties of loop nests via symbolic code analysis and
performance profiling, respectively. Performance embeddings enable direct
knowledge transfer of performance tuning between applications, which can result
from autotuning or tailored improvements. We demonstrate this transfer tuning
approach on case studies in deep neural networks, dense and sparse linear
algebra compositions, and numerical weather prediction stencils. Transfer
tuning reduces the search complexity by up to four orders of magnitude and
outperforms the MKL library in sparse-dense matrix multiplication. The results
exhibit clear correspondences between program characteristics and
optimizations, outperforming prior specialized state-of-the-art approaches and
generalizing beyond their capabilities.
Related papers
- OptEx: Expediting First-Order Optimization with Approximately Parallelized Iterations [12.696136981847438]
We introduce first-order optimization expedited with approximately parallelized iterations (OptEx)
OptEx is the first framework that enhances the efficiency of FOO by leveraging parallel computing to mitigate its iterative bottleneck.
We provide theoretical guarantees for the reliability of our kernelized gradient estimation and the complexity of SGD-based OptEx.
arXiv Detail & Related papers (2024-02-18T02:19:02Z) - AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for
Preconditioning Matrix [9.629238108795013]
We propose a novel approach to designing the preconditioning matrix by utilizing the gradient difference between two successive steps as the diagonal elements.
We evaluate AGD on public generalization of Natural Language Processing (NLP), Computer Vision (CV), and Recommendation Systems (RecSys)
Our experimental results demonstrate that AGD outperforms the state-of-the-art (SOTA) techniques, achieving highly competitive or significantly better predictive performance.
arXiv Detail & Related papers (2023-12-04T06:20:14Z) - Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest.
Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree.
We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z) - Tree ensemble kernels for Bayesian optimization with known constraints
over mixed-feature spaces [54.58348769621782]
Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search.
Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function.
Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
arXiv Detail & Related papers (2022-07-02T16:59:37Z) - Object Representations as Fixed Points: Training Iterative Refinement
Algorithms with Implicit Differentiation [88.14365009076907]
Iterative refinement is a useful paradigm for representation learning.
We develop an implicit differentiation approach that improves the stability and tractability of training.
arXiv Detail & Related papers (2022-07-02T10:00:35Z) - High-performance symbolic-numerics via multiple dispatch [52.77024349608834]
Symbolics.jl is an extendable symbolic system which uses dynamic multiple dispatch to change behavior depending on the domain needs.
We show that by formalizing a generic API on actions independent of implementation, we can retroactively add optimized data structures to our system.
We demonstrate the ability to swap between classical term-rewriting simplifiers and e-graph-based term-rewriting simplifiers.
arXiv Detail & Related papers (2021-05-09T14:22:43Z) - Analytical Characterization and Design Space Exploration for
Optimization of CNNs [10.15406080228806]
Loop-level optimization, including loop tiling and loop permutation, are fundamental transformations to reduce data movement.
This paper develops an analytical modeling approach for finding the best loop-level optimization configuration for CNNs on multi-core CPUs.
arXiv Detail & Related papers (2021-01-24T21:36:52Z) - Efficient Learning of Generative Models via Finite-Difference Score
Matching [111.55998083406134]
We present a generic strategy to efficiently approximate any-order directional derivative with finite difference.
Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
arXiv Detail & Related papers (2020-07-07T10:05:01Z) - NOVAS: Non-convex Optimization via Adaptive Stochastic Search for
End-to-End Learning and Control [22.120942106939122]
We propose the use of adaptive search as a building block for general, non- neural optimization operations.
We benchmark it against two existing alternatives on a synthetic energy-based structured task, and showcase its use in optimal control applications.
arXiv Detail & Related papers (2020-06-22T03:40:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.