Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
- URL: http://arxiv.org/abs/2311.02103v1
- Date: Wed, 1 Nov 2023 23:03:59 GMT
- Title: Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
- Authors: Ruihang Lai, Junru Shao, Siyuan Feng, Steven S. Lyubomirsky, Bohan
Hou, Wuwei Lin, Zihao Ye, Hongyi Jin, Yuchen Jin, Jiawei Liu, Lesheng Jin,
Yaxing Cai, Ziheng Jiang, Yong Wu, Sunghyun Park, Prakalp Srivastava, Jared
G. Roesch, Todd C. Mowry, Tianqi Chen
- Abstract summary: We present Relax, a compiler abstraction for optimizing end-to-end dynamic machine learning workloads.
Relax introduces first-class symbolic shape annotations to track dynamic shape computations globally across the program.
We build an end-to-end compilation framework using the proposed approach to optimize dynamic shape models.
- Score: 19.79913796167022
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Dynamic shape computations have become critical in modern machine learning
workloads, especially in emerging large language models. The success of these
models has driven demand for deploying them to a diverse set of backend
environments. In this paper, we present Relax, a compiler abstraction for
optimizing end-to-end dynamic machine learning workloads. Relax introduces
first-class symbolic shape annotations to track dynamic shape computations
globally across the program. It also introduces a cross-level abstraction that
encapsulates computational graphs, loop-level tensor programs, and library
calls in a single representation to enable cross-level optimizations. We build
an end-to-end compilation framework using the proposed approach to optimize
dynamic shape models. Experimental results on large language models show that
Relax delivers performance competitive with state-of-the-art hand-optimized
systems across platforms and enables deployment of emerging dynamic models to a
broader set of environments, including mobile phones, embedded devices, and web
browsers.
Related papers
- Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.
Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.
We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning [113.89327264634984]
Few-shot class-incremental learning (FSCIL) confronts the challenge of integrating new classes into a model with minimal training samples.
Traditional methods widely adopt static adaptation relying on a fixed parameter space to learn from data that arrive sequentially.
We propose a dual selective SSM projector that dynamically adjusts the projection parameters based on the intermediate features for dynamic adaptation.
arXiv Detail & Related papers (2024-07-08T17:09:39Z) - Expedited Training of Visual Conditioned Language Generation via
Redundancy Reduction [61.16125290912494]
$textEVL_textGen$ is a framework designed for the pre-training of visually conditioned language generation models.
We show that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.
arXiv Detail & Related papers (2023-10-05T03:40:06Z) - TensorIR: An Abstraction for Automatic Tensorized Program Optimization [22.812702519665617]
We presentIR, a compiler for optimizing programs with tensor computation primitives.
We build an end-to-end framework on top of our compilation to automatically optimize deep learning models for given tensor computation primitives.
arXiv Detail & Related papers (2022-07-09T16:28:57Z) - Learning Intermediate Representations using Graph Neural Networks for
NUMA and Prefetchers Optimization [1.3999481573773074]
This paper demonstrates how the static Intermediate Representation (IR) of the code can guide NUMA/prefetcher optimizations without the prohibitive cost of performance profiling.
We show that our static intermediate representation based model achieves 80% of the performance gains provided by expensive dynamic performance profiling based strategies.
arXiv Detail & Related papers (2022-03-01T16:51:30Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - SINGA-Easy: An Easy-to-Use Framework for MultiModal Analysis [18.084628500554462]
We introduce SINGA-Easy, a new deep learning framework that provides distributed hyper- parameter tuning at the training stage, dynamic computational cost control at the inference stage, and intuitive user interactions with multimedia contents facilitated by model explanation.
Our experiments on the training and deployment of multi-modality data analysis applications show that the framework is both usable and adaptable to dynamic inference loads.
arXiv Detail & Related papers (2021-08-03T08:39:54Z) - TCL: Transformer-based Dynamic Graph Modelling via Contrastive Learning [87.38675639186405]
We propose a novel graph neural network approach, called TCL, which deals with the dynamically-evolving graph in a continuous-time fashion.
To the best of our knowledge, this is the first attempt to apply contrastive learning to representation learning on dynamic graphs.
arXiv Detail & Related papers (2021-05-17T15:33:25Z) - A Learned Performance Model for Tensor Processing Units [5.733911161090224]
We demonstrate a method of learning performance models from a corpus of graph programs for Processing Unit (TPU) instances.
We show that our learned model outperforms a heavily-optimized analytical performance model on two tasks.
It helps an autotuner discover faster programs in a setting where access to TPUs is limited or expensive.
arXiv Detail & Related papers (2020-08-03T17:24:52Z) - S2RMs: Spatially Structured Recurrent Modules [105.0377129434636]
We take a step towards exploiting dynamic structure that are capable of simultaneously exploiting both modular andtemporal structures.
We find our models to be robust to the number of available views and better capable of generalization to novel tasks without additional training.
arXiv Detail & Related papers (2020-07-13T17:44:30Z) - Nimble: Efficiently Compiling Dynamic Neural Networks for Model
Inference [22.267489467486467]
This paper proposes Nimble, a high-performance and flexible system to optimize, compile, and execute dynamic neural networks on multiple platforms.
Our evaluation demonstrates that Nimble outperforms state-of-the-art deep learning frameworks and runtime systems for dynamic neural networks by up to 20x on hardware platforms.
arXiv Detail & Related papers (2020-06-04T17:39:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.