Related papers: MetaTune: Meta-Learning Based Cost Model for Fast and Efficient Auto-tuning Frameworks

MetaTune: Meta-Learning Based Cost Model for Fast and Efficient Auto-tuning Frameworks

URL: http://arxiv.org/abs/2102.04199v2
Date: Tue, 9 Feb 2021 06:25:41 GMT
Title: MetaTune: Meta-Learning Based Cost Model for Fast and Efficient Auto-tuning Frameworks
Authors: Jaehun Ryu, Hyojin Sung
Abstract summary: This paper proposes MetaTune, a meta-learning based cost model that more quickly and accurately predicts the performance of optimized codes with pre-trained model parameters. The framework provides 8 to 13% better inference time on average for four CNN models with comparable or lower optimization time while outperforming transfer learning by 10% in cross-platform cases.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning compiler frameworks are gaining ground as a more portable back-end for deep learning applications on increasingly diverse hardware. However, they face the daunting challenge of matching performance offered by hand-tuned target-specific libraries. While auto-tuning frameworks with statistical cost models can provide dynamic and efficient code optimization, they suffer from large space exploration and cost model training overheads. This paper proposes MetaTune, a meta-learning based cost model that more quickly and accurately predicts the performance of optimized codes with pre-trained model parameters. MetaTune encodes convolution kernel codes as structurally similar graphs to facilitate meta-learning, meta-trains a GNN model with a very small input data set, and then predicts optimization parameters for unseen convolution operations with varying sizes and structures during compilation. The resulting framework with MetaTune provides 8 to 13% better inference time on average for four CNN models with comparable or lower optimization time while outperforming transfer learning by 10% in cross-platform cases.

Related papers

Optimizing ML Training with Metagradient Descent [69.89631748402377]
We introduce an algorithm for efficiently calculating metagradients -- gradients through model training -- at scale. We then introduce a "smooth model training" framework that enables effective optimization using metagradients.
arXiv Detail & Related papers (2025-03-17T22:18:24Z)
Building Efficient Lightweight CNN Models [0.0]
Convolutional Neural Networks (CNNs) are pivotal in image classification tasks due to their robust feature extraction capabilities. This paper introduces a methodology to construct lightweight CNNs while maintaining competitive accuracy. The proposed model achieved a state-of-the-art accuracy of 99% on the handwritten digit MNIST and 89% on fashion MNIST, with only 14,862 parameters and a model size of 0.17 MB.
arXiv Detail & Related papers (2025-01-26T14:39:01Z)
OmniBal: Towards Fast Instruction-Tuning for Vision-Language Models via Omniverse Computation Balance [65.48009829137824]
Large-scale 3D parallel training on vision-language instruction-tuning models leads to an imbalanced computation load across different devices.<n>We rebalance the computational load from data, model, and memory perspectives, achieving more balanced computation across devices.<n>Our method's efficacy and generalizability are further validated across various models and datasets.
arXiv Detail & Related papers (2024-07-30T12:02:58Z)
G-Meta: Distributed Meta Learning in GPU Clusters for Large-Scale Recommender Systems [16.343248795178685]
This paper provides a framework for large-scale training for optimization-based Meta DLRM models over the textbfGPU cluster. Various experimental results show that G-Meta achieves notable training speed without loss of statistical performance.
arXiv Detail & Related papers (2024-01-09T03:35:43Z)
Leveraging Reinforcement Learning and Large Language Models for Code Optimization [14.602997316032706]
This paper introduces a new framework to decrease the complexity of code optimization. The proposed framework builds on large language models (LLMs) and reinforcement learning (RL) We run several experiments on the PIE dataset using a CodeT5 language model and RRHF, a new reinforcement learning algorithm.
arXiv Detail & Related papers (2023-12-09T19:50:23Z)
CoLLiE: Collaborative Training of Large Language Models in an Efficient Way [59.09824823710863]
CoLLiE is an efficient library that facilitates collaborative training of large language models. With its modular design and comprehensive functionality, CoLLiE offers a balanced blend of efficiency, ease of use, and customization.
arXiv Detail & Related papers (2023-12-01T08:02:16Z)
Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How [62.467716468917224]
We propose a methodology that jointly searches for the optimal pretrained model and the hyperparameters for finetuning it. Our method transfers knowledge about the performance of many pretrained models on a series of datasets. We empirically demonstrate that our resulting approach can quickly select an accurate pretrained model for a new dataset.
arXiv Detail & Related papers (2023-06-06T16:15:26Z)
Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training [17.556432199389615]
Slapo is a schedule language that decouples the execution of a tensor-level operator from its arithmetic definition. We show that Slapo can improve training throughput by up to 2.92x on a single machine with 8 NVIDIA V100 GPUs.
arXiv Detail & Related papers (2023-02-16T00:34:53Z)
VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles. We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates. We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z)
Towards Optimal VPU Compiler Cost Modeling by using Neural Networks to Infer Hardware Performances [58.720142291102135]
'VPUNN' is a neural network-based cost model trained on low-level task profiling. It consistently outperforms the state-of-the-art cost modeling in Intel's line of VPU processors.
arXiv Detail & Related papers (2022-05-09T22:48:39Z)
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive. We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights. Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z)
Conservative Objective Models for Effective Offline Model-Based Optimization [78.19085445065845]
Computational design problems arise in a number of settings, from synthetic biology to computer architectures. We propose a method that learns a model of the objective function that lower bounds the actual value of the ground-truth objective on out-of-distribution inputs. COMs are simple to implement and outperform a number of existing methods on a wide range of MBO problems.
arXiv Detail & Related papers (2021-07-14T17:55:28Z)
A Learned Performance Model for Tensor Processing Units [5.733911161090224]
We demonstrate a method of learning performance models from a corpus of graph programs for Processing Unit (TPU) instances. We show that our learned model outperforms a heavily-optimized analytical performance model on two tasks. It helps an autotuner discover faster programs in a setting where access to TPUs is limited or expensive.
arXiv Detail & Related papers (2020-08-03T17:24:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.