Related papers: PolyTOPS: Reconfigurable and Flexible Polyhedral Scheduler

PolyTOPS: Reconfigurable and Flexible Polyhedral Scheduler

URL: http://arxiv.org/abs/2401.06665v1
Date: Fri, 12 Jan 2024 16:11:27 GMT
Title: PolyTOPS: Reconfigurable and Flexible Polyhedral Scheduler
Authors: Gianpietro Consolaro, Zhen Zhang, Harenome Razanajato, Nelson Lossing, Nassim Tchoulak, Adilla Susungi, Artur Cesar Araujo Alves, Renwei Zhang, Denis Barthou, Corinne Ancourt, Cedric Bastoul
Abstract summary: We introduce a new polyhedral scheduler, PolyTOPS, that can be adjusted to various scenarios with straightforward, high-level configurations. PolyTOPS has been used with isl and CLooG as code generators and has been integrated in MindSpore deep learning compiler.
Score: 1.6673953344957533
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Polyhedral techniques have been widely used for automatic code optimization in low-level compilers and higher-level processes. Loop optimization is central to this technique, and several polyhedral schedulers like Feautrier, Pluto, isl and Tensor Scheduler have been proposed, each of them targeting a different architecture, parallelism model, or application scenario. The need for scenario-specific optimization is growing due to the heterogeneity of architectures. One of the most critical cases is represented by NPUs (Neural Processing Units) used for AI, which may require loop optimization with different objectives. Another factor to be considered is the framework or compiler in which polyhedral optimization takes place. Different scenarios, depending on the target architecture, compilation environment, and application domain, may require different kinds of optimization to best exploit the architecture feature set. We introduce a new configurable polyhedral scheduler, PolyTOPS, that can be adjusted to various scenarios with straightforward, high-level configurations. This scheduler allows the creation of diverse scheduling strategies that can be both scenario-specific (like state-of-the-art schedulers) and kernel-specific, breaking the concept of a one-size-fits-all scheduler approach. PolyTOPS has been used with isl and CLooG as code generators and has been integrated in MindSpore AKG deep learning compiler. Experimental results in different scenarios show good performance: a geomean speedup of 7.66x on MindSpore (for the NPU Ascend architecture) hybrid custom operators over isl scheduling, a geomean speedup up to 1.80x on PolyBench on different multicore architectures over Pluto scheduling. Finally, some comparisons with different state-of-the-art tools are presented in the PolyMage scenario.

Related papers

Automatic selection of the best neural architecture for time series forecasting via multi-objective optimization and Pareto optimality conditions [1.4843690728082002]
Time series forecasting plays a pivotal role in a wide range of applications, including weather prediction, healthcare, structural health monitoring, predictive maintenance, energy systems, and financial markets. While models such as LSTM, GRU, Transformers, and State-Space Models (SSMs) have become standard tools in this domain, selecting the optimal architecture remains a challenge. We introduce a flexible automated framework for time series forecasting that integrates LSTM, GRU, multi-head Attention, and SSM blocks.
arXiv Detail & Related papers (2025-01-21T15:33:55Z)
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE. Our results demonstrate an average 21% improvement in prefill throughput over existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z)
LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers [1.7529897611426233]
We introduce LOOPer, the first polyhedral autoscheduler that uses a deep-learning based cost model. It supports the exploration of a large set of affine transformations, allowing the application of complex sequences of polyhedral transformations. It also supports the optimization of programs with multiple loop nests and with rectangular and non-rectangular iteration domains.
arXiv Detail & Related papers (2024-03-18T07:22:31Z)
Machine Learning Optimized Orthogonal Basis Piecewise Polynomial Approximation [0.9208007322096533]
Piecewise Polynomials (PPs) are utilized in several engineering disciplines, like trajectory planning, to approximate position profiles given in the form of a set of points.
arXiv Detail & Related papers (2024-03-13T14:34:34Z)
Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers [108.72225067368592]
We propose a novel perspective to investigate the design of large language models (LLMs)-based prompts. We identify two pivotal factors in model parameter learning: update direction and update method. We develop a capable Gradient-inspired Prompt-based GPO.
arXiv Detail & Related papers (2024-02-27T15:05:32Z)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels. We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z)
Performance Optimization using Multimodal Modeling and Heterogeneous GNN [1.304892050913381]
We propose a technique for tuning parallel code regions that is general enough to be adapted to multiple tasks. In this paper, we analyze IR-based programming models to make task-specific performance optimizations. Our experiments show that this multimodal learning based approach outperforms the state-of-the-art in all experiments.
arXiv Detail & Related papers (2023-04-25T04:27:43Z)
Machine Learning-Driven Adaptive OpenMP For Portable Performance on Heterogeneous Systems [1.885335997132172]
Adapting a program to a new heterogeneous platform is laborious and requires developers to manually explore a vast space of execution parameters. This paper proposes extensions to OpenMP for autonomous, machine learning-driven adaptation. Our solution includes a set of novel language constructs, compiler transformations, and runtime support.
arXiv Detail & Related papers (2023-03-15T18:37:18Z)
Multi-Agent Reinforcement Learning for Microprocessor Design Space Exploration [71.95914457415624]
Microprocessor architects are increasingly resorting to domain-specific customization in the quest for high-performance and energy-efficiency. We propose an alternative formulation that leverages Multi-Agent RL (MARL) to tackle this problem. Our evaluation shows that the MARL formulation consistently outperforms single-agent RL baselines.
arXiv Detail & Related papers (2022-11-29T17:10:24Z)
Learning to Superoptimize Real-world Programs [79.4140991035247]
We propose a framework to learn to superoptimize real-world programs by using neural sequence-to-sequence models. We introduce the Big Assembly benchmark, a dataset consisting of over 25K real-world functions mined from open-source projects in x86-64 assembly.
arXiv Detail & Related papers (2021-09-28T05:33:21Z)
A Reinforcement Learning Environment for Polyhedral Optimizations [68.8204255655161]
We propose a shape-agnostic formulation for the space of legal transformations in the polyhedral model as a Markov Decision Process (MDP) Instead of using transformations, the formulation is based on an abstract space of possible schedules. Our generic MDP formulation enables using reinforcement learning to learn optimization policies over a wide range of loops.
arXiv Detail & Related papers (2021-04-28T12:41:52Z)
Optimization-Inspired Learning with Architecture Augmentations and Control Mechanisms for Low-Level Vision [74.9260745577362]
This paper proposes a unified optimization-inspired learning framework to aggregate Generative, Discriminative, and Corrective (GDC) principles. We construct three propagative modules to effectively solve the optimization models with flexible combinations. Experiments across varied low-level vision tasks validate the efficacy and adaptability of GDC.
arXiv Detail & Related papers (2020-12-10T03:24:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.