Automatic Tuning of Tensorflow's CPU Backend using Gradient-Free
Optimization Algorithms
- URL: http://arxiv.org/abs/2109.06266v1
- Date: Mon, 13 Sep 2021 19:10:23 GMT
- Title: Automatic Tuning of Tensorflow's CPU Backend using Gradient-Free
Optimization Algorithms
- Authors: Derssie Mebratu, Niranjan Hasabnis, Pietro Mercati, Gaurit Sharma,
Shamima Najnin
- Abstract summary: Deep learning (DL) applications are built using DL libraries and frameworks such as Genetic and PyTorch.
These frameworks have complex parameters and tuning them to obtain good training and inference performance is challenging for typical users.
In this paper, we treat the problem of tuning parameters of DL frameworks to improve training and inference performance as a black-box problem.
- Score: 0.6543507682026964
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern deep learning (DL) applications are built using DL libraries and
frameworks such as TensorFlow and PyTorch. These frameworks have complex
parameters and tuning them to obtain good training and inference performance is
challenging for typical users, such as DL developers and data scientists.
Manual tuning requires deep knowledge of the user-controllable parameters of DL
frameworks as well as the underlying hardware. It is a slow and tedious
process, and it typically delivers sub-optimal solutions.
In this paper, we treat the problem of tuning parameters of DL frameworks to
improve training and inference performance as a black-box optimization problem.
We then investigate applicability and effectiveness of Bayesian optimization
(BO), genetic algorithm (GA), and Nelder-Mead simplex (NMS) to tune the
parameters of TensorFlow's CPU backend. While prior work has already
investigated the use of Nelder-Mead simplex for a similar problem, it does not
provide insights into the applicability of other more popular algorithms.
Towards that end, we provide a systematic comparative analysis of all three
algorithms in tuning TensorFlow's CPU backend on a variety of DL models. Our
findings reveal that Bayesian optimization performs the best on the majority of
models. There are, however, cases where it does not deliver the best results.
Related papers
- Enabling Efficient On-Device Fine-Tuning of LLMs Using Only Inference Engines [17.539008562641303]
Large Language Models (LLMs) are currently pre-trained and fine-tuned on large cloud servers.
Next frontier is LLM personalization, where a foundation model can be fine-tuned with user/task-specific data.
Fine-tuning on resource-constrained edge devices presents significant challenges due to substantial memory and computational demands.
arXiv Detail & Related papers (2024-09-23T20:14:09Z) - Multi-fidelity Constrained Optimization for Stochastic Black Box
Simulators [1.6385815610837167]
We introduce the algorithm Scout-Nd (Stochastic Constrained Optimization for N dimensions) to tackle the issues mentioned earlier.
Scout-Nd efficiently estimates the gradient, reduces the noise of the estimator gradient, and applies multi-fidelity schemes to further reduce computational effort.
We validate our approach on standard benchmarks, demonstrating its effectiveness in optimizing parameters highlighting better performance compared to existing methods.
arXiv Detail & Related papers (2023-11-25T23:36:38Z) - AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models.
AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z) - Tune As You Scale: Hyperparameter Optimization For Compute Efficient
Training [0.0]
We propose a practical method for robustly tuning large models.
CarBS performs local search around the performance-cost frontier.
Among our results, we effectively solve the entire ProcGen benchmark just by tuning a simple baseline.
arXiv Detail & Related papers (2023-06-13T18:22:24Z) - Performance Embeddings: A Similarity-based Approach to Automatic
Performance Optimization [71.69092462147292]
Performance embeddings enable knowledge transfer of performance tuning between applications.
We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils.
arXiv Detail & Related papers (2023-03-14T15:51:35Z) - VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles.
We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates.
We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z) - I/O Lower Bounds for Auto-tuning of Convolutions in CNNs [2.571796445061562]
We develop a general I/O lower bound theory for a composite algorithm which consists of several different sub-computations.
We design the near I/O-optimal dataflow strategies for the two main convolution algorithms by fully exploiting the data reuse.
Experiment results show that our dataflow strategies with the auto-tuning approach can achieve about 3.32x performance speedup on average over cuDNN.
arXiv Detail & Related papers (2020-12-31T15:46:01Z) - Self-Tuning Stochastic Optimization with Curvature-Aware Gradient
Filtering [53.523517926927894]
We explore the use of exact per-sample Hessian-vector products and gradients to construct self-tuning quadratics.
We prove that our model-based procedure converges in noisy gradient setting.
This is an interesting step for constructing self-tuning quadratics.
arXiv Detail & Related papers (2020-11-09T22:07:30Z) - A Primer on Zeroth-Order Optimization in Signal Processing and Machine
Learning [95.85269649177336]
ZO optimization iteratively performs three major steps: gradient estimation, descent direction, and solution update.
We demonstrate promising applications of ZO optimization, such as evaluating and generating explanations from black-box deep learning models, and efficient online sensor management.
arXiv Detail & Related papers (2020-06-11T06:50:35Z) - Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations.
Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization.
It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.