Related papers: Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization

Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization

URL: http://arxiv.org/abs/2303.08142v1
Date: Tue, 14 Mar 2023 15:51:35 GMT
Title: Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization
Authors: Lukas Tr\"umper, Tal Ben-Nun, Philipp Schaad, Alexandru Calotoiu, Torsten Hoefler
Abstract summary: Performance embeddings enable knowledge transfer of performance tuning between applications. We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils.
Score: 71.69092462147292
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Performance optimization is an increasingly challenging but often repetitive task. While each platform has its quirks, the underlying code transformations rely on data movement and computational characteristics that recur across applications. This paper proposes to leverage those similarities by constructing an embedding space for subprograms. The continuous space captures both static and dynamic properties of loop nests via symbolic code analysis and performance profiling, respectively. Performance embeddings enable direct knowledge transfer of performance tuning between applications, which can result from autotuning or tailored improvements. We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils. Transfer tuning reduces the search complexity by up to four orders of magnitude and outperforms the MKL library in sparse-dense matrix multiplication. The results exhibit clear correspondences between program characteristics and optimizations, outperforming prior specialized state-of-the-art approaches and generalizing beyond their capabilities.

Related papers

Dynamic Logistic Ensembles with Recursive Probability and Automatic Subset Splitting for Enhanced Binary Classification [2.7396014165932923]
This paper presents a novel approach to binary classification using dynamic logistic ensemble models. We develop an algorithm that automatically partitions the dataset into multiple subsets, constructing an ensemble of logistic models to enhance classification accuracy. This work balances computational efficiency with theoretical rigor, providing a robust and interpretable solution for complex classification tasks.
arXiv Detail & Related papers (2024-11-27T00:22:55Z)
OptEx: Expediting First-Order Optimization with Approximately Parallelized Iterations [12.696136981847438]
We introduce first-order optimization expedited with approximately parallelized iterations (OptEx) OptEx is the first framework that enhances the efficiency of FOO by leveraging parallel computing to mitigate its iterative bottleneck. We provide theoretical guarantees for the reliability of our kernelized gradient estimation and the complexity of SGD-based OptEx.
arXiv Detail & Related papers (2024-02-18T02:19:02Z)
AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix [9.629238108795013]
We propose a novel approach to designing the preconditioning matrix by utilizing the gradient difference between two successive steps as the diagonal elements. We evaluate AGD on public generalization of Natural Language Processing (NLP), Computer Vision (CV), and Recommendation Systems (RecSys) Our experimental results demonstrate that AGD outperforms the state-of-the-art (SOTA) techniques, achieving highly competitive or significantly better predictive performance.
arXiv Detail & Related papers (2023-12-04T06:20:14Z)
Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest. Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree. We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z)
Tree ensemble kernels for Bayesian optimization with known constraints over mixed-feature spaces [54.58348769621782]
Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search. Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function. Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
arXiv Detail & Related papers (2022-07-02T16:59:37Z)
Object Representations as Fixed Points: Training Iterative Refinement Algorithms with Implicit Differentiation [88.14365009076907]
Iterative refinement is a useful paradigm for representation learning. We develop an implicit differentiation approach that improves the stability and tractability of training.
arXiv Detail & Related papers (2022-07-02T10:00:35Z)
High-performance symbolic-numerics via multiple dispatch [52.77024349608834]
Symbolics.jl is an extendable symbolic system which uses dynamic multiple dispatch to change behavior depending on the domain needs. We show that by formalizing a generic API on actions independent of implementation, we can retroactively add optimized data structures to our system. We demonstrate the ability to swap between classical term-rewriting simplifiers and e-graph-based term-rewriting simplifiers.
arXiv Detail & Related papers (2021-05-09T14:22:43Z)
Analytical Characterization and Design Space Exploration for Optimization of CNNs [10.15406080228806]
Loop-level optimization, including loop tiling and loop permutation, are fundamental transformations to reduce data movement. This paper develops an analytical modeling approach for finding the best loop-level optimization configuration for CNNs on multi-core CPUs.
arXiv Detail & Related papers (2021-01-24T21:36:52Z)
Efficient Learning of Generative Models via Finite-Difference Score Matching [111.55998083406134]
We present a generic strategy to efficiently approximate any-order directional derivative with finite difference. Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
arXiv Detail & Related papers (2020-07-07T10:05:01Z)
NOVAS: Non-convex Optimization via Adaptive Stochastic Search for End-to-End Learning and Control [22.120942106939122]
We propose the use of adaptive search as a building block for general, non- neural optimization operations. We benchmark it against two existing alternatives on a synthetic energy-based structured task, and showcase its use in optimal control applications.
arXiv Detail & Related papers (2020-06-22T03:40:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.