Related papers: Automatic Tuning of Tensorflow's CPU Backend using Gradient-Free Optimization Algorithms

Automatic Tuning of Tensorflow's CPU Backend using Gradient-Free Optimization Algorithms

URL: http://arxiv.org/abs/2109.06266v1
Date: Mon, 13 Sep 2021 19:10:23 GMT
Title: Automatic Tuning of Tensorflow's CPU Backend using Gradient-Free Optimization Algorithms
Authors: Derssie Mebratu, Niranjan Hasabnis, Pietro Mercati, Gaurit Sharma, Shamima Najnin
Abstract summary: Deep learning (DL) applications are built using DL libraries and frameworks such as Genetic and PyTorch. These frameworks have complex parameters and tuning them to obtain good training and inference performance is challenging for typical users. In this paper, we treat the problem of tuning parameters of DL frameworks to improve training and inference performance as a black-box problem.
Score: 0.6543507682026964
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern deep learning (DL) applications are built using DL libraries and frameworks such as TensorFlow and PyTorch. These frameworks have complex parameters and tuning them to obtain good training and inference performance is challenging for typical users, such as DL developers and data scientists. Manual tuning requires deep knowledge of the user-controllable parameters of DL frameworks as well as the underlying hardware. It is a slow and tedious process, and it typically delivers sub-optimal solutions. In this paper, we treat the problem of tuning parameters of DL frameworks to improve training and inference performance as a black-box optimization problem. We then investigate applicability and effectiveness of Bayesian optimization (BO), genetic algorithm (GA), and Nelder-Mead simplex (NMS) to tune the parameters of TensorFlow's CPU backend. While prior work has already investigated the use of Nelder-Mead simplex for a similar problem, it does not provide insights into the applicability of other more popular algorithms. Towards that end, we provide a systematic comparative analysis of all three algorithms in tuning TensorFlow's CPU backend on a variety of DL models. Our findings reveal that Bayesian optimization performs the best on the majority of models. There are, however, cases where it does not deliver the best results.

Related papers

ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization [51.280919773837645]
We develop ScoreFlow, a high-performance framework for agent workflow optimization. ScoreFlow incorporates Score-DPO, a novel variant of the direct preference optimization method that accounts for quantitative feedback. It achieves an 8.2% improvement over existing baselines across question answering, coding, and mathematical reasoning.
arXiv Detail & Related papers (2025-02-06T18:47:49Z)
Enabling Efficient On-Device Fine-Tuning of LLMs Using Only Inference Engines [17.539008562641303]
Large Language Models (LLMs) are currently pre-trained and fine-tuned on large cloud servers. Next frontier is LLM personalization, where a foundation model can be fine-tuned with user/task-specific data. Fine-tuning on resource-constrained edge devices presents significant challenges due to substantial memory and computational demands.
arXiv Detail & Related papers (2024-09-23T20:14:09Z)
Multi-fidelity Constrained Optimization for Stochastic Black Box Simulators [1.6385815610837167]
We introduce the algorithm Scout-Nd (Stochastic Constrained Optimization for N dimensions) to tackle the issues mentioned earlier. Scout-Nd efficiently estimates the gradient, reduces the noise of the estimator gradient, and applies multi-fidelity schemes to further reduce computational effort. We validate our approach on standard benchmarks, demonstrating its effectiveness in optimizing parameters highlighting better performance compared to existing methods.
arXiv Detail & Related papers (2023-11-25T23:36:38Z)
AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models. AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z)
Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training [0.0]
We propose a practical method for robustly tuning large models. CarBS performs local search around the performance-cost frontier. Among our results, we effectively solve the entire ProcGen benchmark just by tuning a simple baseline.
arXiv Detail & Related papers (2023-06-13T18:22:24Z)
Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization [71.69092462147292]
Performance embeddings enable knowledge transfer of performance tuning between applications. We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils.
arXiv Detail & Related papers (2023-03-14T15:51:35Z)
VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles. We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates. We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z)
I/O Lower Bounds for Auto-tuning of Convolutions in CNNs [2.571796445061562]
We develop a general I/O lower bound theory for a composite algorithm which consists of several different sub-computations. We design the near I/O-optimal dataflow strategies for the two main convolution algorithms by fully exploiting the data reuse. Experiment results show that our dataflow strategies with the auto-tuning approach can achieve about 3.32x performance speedup on average over cuDNN.
arXiv Detail & Related papers (2020-12-31T15:46:01Z)
Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering [53.523517926927894]
We explore the use of exact per-sample Hessian-vector products and gradients to construct self-tuning quadratics. We prove that our model-based procedure converges in noisy gradient setting. This is an interesting step for constructing self-tuning quadratics.
arXiv Detail & Related papers (2020-11-09T22:07:30Z)
A Primer on Zeroth-Order Optimization in Signal Processing and Machine Learning [95.85269649177336]
ZO optimization iteratively performs three major steps: gradient estimation, descent direction, and solution update. We demonstrate promising applications of ZO optimization, such as evaluating and generating explanations from black-box deep learning models, and efficient online sensor management.
arXiv Detail & Related papers (2020-06-11T06:50:35Z)
Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations. Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization. It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.