PolyTOPS: Reconfigurable and Flexible Polyhedral Scheduler
- URL: http://arxiv.org/abs/2401.06665v1
- Date: Fri, 12 Jan 2024 16:11:27 GMT
- Title: PolyTOPS: Reconfigurable and Flexible Polyhedral Scheduler
- Authors: Gianpietro Consolaro, Zhen Zhang, Harenome Razanajato, Nelson Lossing,
Nassim Tchoulak, Adilla Susungi, Artur Cesar Araujo Alves, Renwei Zhang,
Denis Barthou, Corinne Ancourt, Cedric Bastoul
- Abstract summary: We introduce a new polyhedral scheduler, PolyTOPS, that can be adjusted to various scenarios with straightforward, high-level configurations.
PolyTOPS has been used with isl and CLooG as code generators and has been integrated in MindSpore deep learning compiler.
- Score: 1.6673953344957533
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Polyhedral techniques have been widely used for automatic code optimization
in low-level compilers and higher-level processes. Loop optimization is central
to this technique, and several polyhedral schedulers like Feautrier, Pluto, isl
and Tensor Scheduler have been proposed, each of them targeting a different
architecture, parallelism model, or application scenario. The need for
scenario-specific optimization is growing due to the heterogeneity of
architectures. One of the most critical cases is represented by NPUs (Neural
Processing Units) used for AI, which may require loop optimization with
different objectives. Another factor to be considered is the framework or
compiler in which polyhedral optimization takes place. Different scenarios,
depending on the target architecture, compilation environment, and application
domain, may require different kinds of optimization to best exploit the
architecture feature set.
We introduce a new configurable polyhedral scheduler, PolyTOPS, that can be
adjusted to various scenarios with straightforward, high-level configurations.
This scheduler allows the creation of diverse scheduling strategies that can be
both scenario-specific (like state-of-the-art schedulers) and kernel-specific,
breaking the concept of a one-size-fits-all scheduler approach. PolyTOPS has
been used with isl and CLooG as code generators and has been integrated in
MindSpore AKG deep learning compiler. Experimental results in different
scenarios show good performance: a geomean speedup of 7.66x on MindSpore (for
the NPU Ascend architecture) hybrid custom operators over isl scheduling, a
geomean speedup up to 1.80x on PolyBench on different multicore architectures
over Pluto scheduling. Finally, some comparisons with different
state-of-the-art tools are presented in the PolyMage scenario.
Related papers
- EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE.
Our results demonstrate an average 21% improvement in prefill throughput over existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z) - LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers [1.7529897611426233]
We introduce LOOPer, the first polyhedral autoscheduler that uses a deep-learning based cost model.
It supports the exploration of a large set of affine transformations, allowing the application of complex sequences of polyhedral transformations.
It also supports the optimization of programs with multiple loop nests and with rectangular and non-rectangular iteration domains.
arXiv Detail & Related papers (2024-03-18T07:22:31Z) - Machine Learning Optimized Orthogonal Basis Piecewise Polynomial Approximation [0.9208007322096533]
Piecewise Polynomials (PPs) are utilized in several engineering disciplines, like trajectory planning, to approximate position profiles given in the form of a set of points.
arXiv Detail & Related papers (2024-03-13T14:34:34Z) - Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels.
We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion.
We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z) - Performance Optimization using Multimodal Modeling and Heterogeneous GNN [1.304892050913381]
We propose a technique for tuning parallel code regions that is general enough to be adapted to multiple tasks.
In this paper, we analyze IR-based programming models to make task-specific performance optimizations.
Our experiments show that this multimodal learning based approach outperforms the state-of-the-art in all experiments.
arXiv Detail & Related papers (2023-04-25T04:27:43Z) - Machine Learning-Driven Adaptive OpenMP For Portable Performance on
Heterogeneous Systems [1.885335997132172]
Adapting a program to a new heterogeneous platform is laborious and requires developers to manually explore a vast space of execution parameters.
This paper proposes extensions to OpenMP for autonomous, machine learning-driven adaptation.
Our solution includes a set of novel language constructs, compiler transformations, and runtime support.
arXiv Detail & Related papers (2023-03-15T18:37:18Z) - Multi-Agent Reinforcement Learning for Microprocessor Design Space
Exploration [71.95914457415624]
Microprocessor architects are increasingly resorting to domain-specific customization in the quest for high-performance and energy-efficiency.
We propose an alternative formulation that leverages Multi-Agent RL (MARL) to tackle this problem.
Our evaluation shows that the MARL formulation consistently outperforms single-agent RL baselines.
arXiv Detail & Related papers (2022-11-29T17:10:24Z) - Learning to Superoptimize Real-world Programs [79.4140991035247]
We propose a framework to learn to superoptimize real-world programs by using neural sequence-to-sequence models.
We introduce the Big Assembly benchmark, a dataset consisting of over 25K real-world functions mined from open-source projects in x86-64 assembly.
arXiv Detail & Related papers (2021-09-28T05:33:21Z) - A Reinforcement Learning Environment for Polyhedral Optimizations [68.8204255655161]
We propose a shape-agnostic formulation for the space of legal transformations in the polyhedral model as a Markov Decision Process (MDP)
Instead of using transformations, the formulation is based on an abstract space of possible schedules.
Our generic MDP formulation enables using reinforcement learning to learn optimization policies over a wide range of loops.
arXiv Detail & Related papers (2021-04-28T12:41:52Z) - Optimization-Inspired Learning with Architecture Augmentations and
Control Mechanisms for Low-Level Vision [74.9260745577362]
This paper proposes a unified optimization-inspired learning framework to aggregate Generative, Discriminative, and Corrective (GDC) principles.
We construct three propagative modules to effectively solve the optimization models with flexible combinations.
Experiments across varied low-level vision tasks validate the efficacy and adaptability of GDC.
arXiv Detail & Related papers (2020-12-10T03:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.