Parallel Scheduling Self-attention Mechanism: Generalization and
Optimization
- URL: http://arxiv.org/abs/2012.01114v1
- Date: Wed, 2 Dec 2020 12:04:16 GMT
- Title: Parallel Scheduling Self-attention Mechanism: Generalization and
Optimization
- Authors: Mingfei Yu and Masahiro Fujita
- Abstract summary: We propose a general scheduling algorithm, which is derived from the optimum scheduling for small instances solved by a satisfiability checking(SAT) solver.
Strategies for further optimization on skipping redundant computations are put forward as well, with which reductions of almost 25% and 50% of the original computations are respectively achieved.
The proposed algorithms are applicable regardless of problem sizes, as long as the number of input vectors is divisible to the number of computing units available in the architecture.
- Score: 0.76146285961466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Over the past few years, self-attention is shining in the field of deep
learning, especially in the domain of natural language processing(NLP). Its
impressive effectiveness, along with ubiquitous implementations, have aroused
our interest in efficiently scheduling the data-flow of corresponding
computations onto architectures with many computing units to realize parallel
computing. In this paper, based on the theory of self-attention mechanism and
state-of-the-art realization of self-attention in language models, we propose a
general scheduling algorithm, which is derived from the optimum scheduling for
small instances solved by a satisfiability checking(SAT) solver, to parallelize
typical computations of self-attention. Strategies for further optimization on
skipping redundant computations are put forward as well, with which reductions
of almost 25% and 50% of the original computations are respectively achieved
for two widely-adopted application schemes of self-attention. With the proposed
optimization adopted, we have correspondingly come up with another two
scheduling algorithms. The proposed algorithms are applicable regardless of
problem sizes, as long as the number of input vectors is divisible to the
number of computing units available in the architecture. Due to the complexity
of proving the correctness of the algorithms mathematically for general cases,
we have conducted experiments to reveal their validity, together with the
superior quality of the solutions provided by which, by solving SAT problems
for particular instances.
Related papers
- AxOMaP: Designing FPGA-based Approximate Arithmetic Operators using
Mathematical Programming [2.898055875927704]
We propose a data analysis-driven mathematical programming-based approach to synthesizing approximate operators for FPGAs.
Specifically, we formulate mixed integer quadratically constrained programs based on the results of correlation analysis of the characterization data.
Compared to traditional evolutionary algorithms-based optimization, we report up to 21% improvement in the hypervolume, for joint optimization of PPA and BEHAV.
arXiv Detail & Related papers (2023-09-23T18:23:54Z) - Accelerating Cutting-Plane Algorithms via Reinforcement Learning
Surrogates [49.84541884653309]
A current standard approach to solving convex discrete optimization problems is the use of cutting-plane algorithms.
Despite the existence of a number of general-purpose cut-generating algorithms, large-scale discrete optimization problems continue to suffer from intractability.
We propose a method for accelerating cutting-plane algorithms via reinforcement learning.
arXiv Detail & Related papers (2023-07-17T20:11:56Z) - Fast Optimal Locally Private Mean Estimation via Random Projections [58.603579803010796]
We study the problem of locally private mean estimation of high-dimensional vectors in the Euclidean ball.
We propose a new algorithmic framework, ProjUnit, for private mean estimation.
Our framework is deceptively simple: each randomizer projects its input to a random low-dimensional subspace, normalizes the result, and then runs an optimal algorithm.
arXiv Detail & Related papers (2023-06-07T14:07:35Z) - Efficient distributed representations with linear-time attention scores normalization [3.8673630752805437]
We propose a linear-time approximation of the attention score normalization constants for embedding vectors with bounded norms.
The accuracy of our estimation formula surpasses competing kernel methods by even orders of magnitude.
The proposed algorithm is highly interpretable and easily adapted to an arbitrary embedding problem.
arXiv Detail & Related papers (2023-03-30T15:48:26Z) - An Efficient Merge Search Matheuristic for Maximising the Net Present
Value of Project Schedules [5.10800491975164]
Resource constrained project scheduling is an important optimisation problem with many practical applications.
We propose a new math-heuristic algorithm based on Merge Search and parallel computing to solve the resource constrained project scheduling.
arXiv Detail & Related papers (2022-10-20T13:30:23Z) - A Two-stage Framework and Reinforcement Learning-based Optimization
Algorithms for Complex Scheduling Problems [54.61091936472494]
We develop a two-stage framework, in which reinforcement learning (RL) and traditional operations research (OR) algorithms are combined together.
The scheduling problem is solved in two stages, including a finite Markov decision process (MDP) and a mixed-integer programming process, respectively.
Results show that the proposed algorithms could stably and efficiently obtain satisfactory scheduling schemes for agile Earth observation satellite scheduling problems.
arXiv Detail & Related papers (2021-03-10T03:16:12Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Efficient Computation of Expectations under Spanning Tree Distributions [67.71280539312536]
We propose unified algorithms for the important cases of first-order expectations and second-order expectations in edge-factored, non-projective spanning-tree models.
Our algorithms exploit a fundamental connection between gradients and expectations, which allows us to derive efficient algorithms.
arXiv Detail & Related papers (2020-08-29T14:58:26Z) - Iterative Algorithm Induced Deep-Unfolding Neural Networks: Precoding
Design for Multiuser MIMO Systems [59.804810122136345]
We propose a framework for deep-unfolding, where a general form of iterative algorithm induced deep-unfolding neural network (IAIDNN) is developed.
An efficient IAIDNN based on the structure of the classic weighted minimum mean-square error (WMMSE) iterative algorithm is developed.
We show that the proposed IAIDNN efficiently achieves the performance of the iterative WMMSE algorithm with reduced computational complexity.
arXiv Detail & Related papers (2020-06-15T02:57:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.