POSET-RL: Phase ordering for Optimizing Size and Execution Time using
Reinforcement Learning
- URL: http://arxiv.org/abs/2208.04238v1
- Date: Wed, 27 Jul 2022 08:32:23 GMT
- Title: POSET-RL: Phase ordering for Optimizing Size and Execution Time using
Reinforcement Learning
- Authors: Shalini Jain, Yashas Andaluri, S. VenkataKeerthy, Ramakrishna
Upadrasta
- Abstract summary: We present a reinforcement learning based solution to the phase ordering problem.
We propose two approaches to model the sequences: one by manual ordering, and other based on a graph called Oz Dependence Graph (ODG)
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The ever increasing memory requirements of several applications has led to
increased demands which might not be met by embedded devices. Constraining the
usage of memory in such cases is of paramount importance. It is important that
such code size improvements should not have a negative impact on the runtime.
Improving the execution time while optimizing for code size is a non-trivial
but a significant task. The ordering of standard optimization sequences in
modern compilers is fixed, and are heuristically created by the compiler domain
experts based on their expertise. However, this ordering is sub-optimal, and
does not generalize well across all the cases. We present a reinforcement
learning based solution to the phase ordering problem, where the ordering
improves both the execution time and code size. We propose two different
approaches to model the sequences: one by manual ordering, and other based on a
graph called Oz Dependence Graph (ODG). Our approach uses minimal data as
training set, and is integrated with LLVM. We show results on x86 and AArch64
architectures on the benchmarks from SPEC-CPU 2006, SPEC-CPU 2017 and MiBench.
We observe that the proposed model based on ODG outperforms the current Oz
sequence both in terms of size and execution time by 6.19% and 11.99% in SPEC
2017 benchmarks, on an average.
Related papers
- TREE: Tree Regularization for Efficient Execution [4.205565040528205]
We present a method to reduce path lengths by rewarding uneven probability distributions during the training of decision trees.
Specifically, we regularize the impurity of the CART algorithm in order to favor not only low impurity, but also highly asymmetric distributions for the evaluation of split criteria.
arXiv Detail & Related papers (2024-06-18T12:01:06Z) - LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory [63.41820940103348]
Self-attention mechanism's computational cost limits its practicality for long sequences.
We propose a new method called LongVQ to compress the global abstraction as a length-fixed codebook.
LongVQ effectively maintains dynamic global and local patterns, which helps to complement the lack of long-range dependency issues.
arXiv Detail & Related papers (2024-04-17T08:26:34Z) - AcceleratedLiNGAM: Learning Causal DAGs at the speed of GPUs [57.12929098407975]
We show that by efficiently parallelizing existing causal discovery methods, we can scale them to thousands of dimensions.
Specifically, we focus on the causal ordering subprocedure in DirectLiNGAM and implement GPU kernels to accelerate it.
This allows us to apply DirectLiNGAM to causal inference on large-scale gene expression data with genetic interventions yielding competitive results.
arXiv Detail & Related papers (2024-03-06T15:06:11Z) - Decreasing the Computing Time of Bayesian Optimization using
Generalizable Memory Pruning [56.334116591082896]
We show a wrapper of memory pruning and bounded optimization capable of being used with any surrogate model and acquisition function.
Running BO on high-dimensional or massive data sets becomes intractable due to this time complexity.
All model implementations are run on the MIT Supercloud state-of-the-art computing hardware.
arXiv Detail & Related papers (2023-09-08T14:05:56Z) - INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order
Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient.
We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture.
We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z) - CHERI Performance Enhancement for a Bytecode Interpreter [0.0]
We show that it is possible to eliminate certain kinds of software-induced runtime overhead that occur due to the larger size of CHERI capabilities (128 bits) relative to native pointers (generally 64 bits)
The worst-case slowdowns are greatly improved, from 100x (before optimization) to 2x (after optimization)
arXiv Detail & Related papers (2023-08-09T17:12:23Z) - Memory Safe Computations with XLA Compiler [14.510796427699459]
XLA compiler extension adjusts the representation of an algorithm according to a user-specified memory limit.
We show that k-nearest neighbour and sparse Gaussian process regression methods can be run at a much larger scale on a single device.
arXiv Detail & Related papers (2022-06-28T16:59:28Z) - Learning to Superoptimize Real-world Programs [79.4140991035247]
We propose a framework to learn to superoptimize real-world programs by using neural sequence-to-sequence models.
We introduce the Big Assembly benchmark, a dataset consisting of over 25K real-world functions mined from open-source projects in x86-64 assembly.
arXiv Detail & Related papers (2021-09-28T05:33:21Z) - Runtime Performances Benchmark for Knowledge Graph Embedding Methods [0.0]
This paper focuses on providing a characterization of the runtime performances of state-of-the-art implementations of KGE alghoritms.
arXiv Detail & Related papers (2020-11-05T21:58:11Z) - Static Neural Compiler Optimization via Deep Reinforcement Learning [1.458855293397494]
In this paper, we employ a deep reinforcement learning approach to the phase-ordering problem.
Provided with sub-sequences constituting LLVM's O3 sequence, our agent learns to outperform the O3 sequence on the set of source codes used for training.
We believe that the models trained using our approach can be integrated into modern compilers as neural optimization agents.
arXiv Detail & Related papers (2020-08-20T13:16:29Z) - PolyDL: Polyhedral Optimizations for Creation of High Performance DL
primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives.
We develop novel data reuse analysis algorithms using the polyhedral model.
We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.