Related papers: Optimizing Tensor Computation Graphs with Equality Saturation and Monte Carlo Tree Search

Optimizing Tensor Computation Graphs with Equality Saturation and Monte Carlo Tree Search

URL: http://arxiv.org/abs/2410.05534v1
Date: Mon, 7 Oct 2024 22:22:02 GMT
Title: Optimizing Tensor Computation Graphs with Equality Saturation and Monte Carlo Tree Search
Authors: Jakob Hartmann, Guoliang He, Eiko Yoneki,
Abstract summary: We present a tensor graph rewriting approach that uses Monte Carlo tree search to build superior representation. Our approach improves the inference speedup of neural networks by up to 11% compared to existing methods.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The real-world effectiveness of deep neural networks often depends on their latency, thereby necessitating optimization techniques that can reduce a model's inference time while preserving its performance. One popular approach is to sequentially rewrite the input computation graph into an equivalent but faster one by replacing individual subgraphs. This approach gives rise to the so-called phase-ordering problem in which the application of one rewrite rule can eliminate the possibility to apply an even better one later on. Recent work has shown that equality saturation, a technique from compiler optimization, can mitigate this issue by first building an intermediate representation (IR) that efficiently stores multiple optimized versions of the input program before extracting the best solution in a second step. In practice, however, memory constraints prevent the IR from capturing all optimized versions and thus reintroduce the phase-ordering problem in the construction phase. In this paper, we present a tensor graph rewriting approach that uses Monte Carlo tree search to build superior IRs by identifying the most promising rewrite rules. We also introduce a novel extraction algorithm that can provide fast and accurate runtime estimates of tensor programs represented in an IR. Our approach improves the inference speedup of neural networks by up to 11% compared to existing methods.

Related papers

Resetting the Optimizer in Deep RL: An Empirical Study [10.907980864371213]
We focus on the task of approximating the optimal value function in deep reinforcement learning. We demonstrate that this simple modification significantly improves the performance of deep RL on the Atari benchmark.
arXiv Detail & Related papers (2023-06-30T17:53:50Z)
Ordering for Non-Replacement SGD [7.11967773739707]
We seek to find an ordering that can improve the convergence rates for the non-replacement form of the algorithm. We develop optimal orderings for constant and decreasing step sizes for strongly convex and convex functions. In addition, we are able to combine the ordering with mini-batch and further apply it to more complex neural networks.
arXiv Detail & Related papers (2023-06-28T00:46:58Z)
Loop Unrolled Shallow Equilibrium Regularizer (LUSER) -- A Memory-Efficient Inverse Problem Solver [26.87738024952936]
In inverse problems we aim to reconstruct some underlying signal of interest from potentially corrupted and often ill-posed measurements. We propose an LU algorithm with shallow equilibrium regularizers (L) These implicit models are as expressive as deeper convolutional networks, but far more memory efficient during training.
arXiv Detail & Related papers (2022-10-10T19:50:37Z)
A Unified Framework for Implicit Sinkhorn Differentiation [58.56866763433335]
We propose an algorithm that obtains analytical gradients of a Sinkhorn layer via implicit differentiation. We show that it is computationally more efficient, particularly when resources like GPU memory are scarce.
arXiv Detail & Related papers (2022-05-13T14:45:31Z)
Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer. We show that there is a natural synergy between these two settings. We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z)
An Accelerated Variance-Reduced Conditional Gradient Sliding Algorithm for First-order and Zeroth-order Optimization [111.24899593052851]
Conditional gradient algorithm (also known as the Frank-Wolfe algorithm) has recently regained popularity in the machine learning community. ARCS is the first zeroth-order conditional gradient sliding type algorithms solving convex problems in zeroth-order optimization. In first-order optimization, the convergence results of ARCS substantially outperform previous algorithms in terms of the number of gradient query oracle.
arXiv Detail & Related papers (2021-09-18T07:08:11Z)
Graph Signal Restoration Using Nested Deep Algorithm Unrolling [85.53158261016331]
Graph signal processing is a ubiquitous task in many applications such as sensor, social transportation brain networks, point cloud processing, and graph networks. We propose two restoration methods based on convexindependent deep ADMM (ADMM) parameters in the proposed restoration methods are trainable in an end-to-end manner.
arXiv Detail & Related papers (2021-06-30T08:57:01Z)
SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models [15.541264326378366]
In recent years, implicit deep learning has emerged as a method to increase the depth of deep neural networks. The training is performed as a bi-level problem, and its computational complexity is partially driven by the iterative inversion of a huge Jacobian matrix. We propose a novel strategy to tackle this computational bottleneck from which many bi-level problems suffer.
arXiv Detail & Related papers (2021-06-01T15:07:34Z)
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step. Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.