Analytical Characterization and Design Space Exploration for
Optimization of CNNs
- URL: http://arxiv.org/abs/2101.09808v2
- Date: Sat, 6 Mar 2021 00:40:24 GMT
- Title: Analytical Characterization and Design Space Exploration for
Optimization of CNNs
- Authors: Rui Li, Yufan Xu, Aravind Sukumaran-Rajam, Atanas Rountev, and P.
Sadayappan
- Abstract summary: Loop-level optimization, including loop tiling and loop permutation, are fundamental transformations to reduce data movement.
This paper develops an analytical modeling approach for finding the best loop-level optimization configuration for CNNs on multi-core CPUs.
- Score: 10.15406080228806
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Moving data through the memory hierarchy is a fundamental bottleneck that can
limit the performance of core algorithms of machine learning, such as
convolutional neural networks (CNNs). Loop-level optimization, including loop
tiling and loop permutation, are fundamental transformations to reduce data
movement. However, the search space for finding the best loop-level
optimization configuration is explosively large. This paper develops an
analytical modeling approach for finding the best loop-level optimization
configuration for CNNs on multi-core CPUs. Experimental evaluation shows that
this approach achieves comparable or better performance than state-of-the-art
libraries and auto-tuning based optimizers for CNNs.
Related papers
- Towards Hyperparameter-Agnostic DNN Training via Dynamical System
Insights [4.513581513983453]
We present a first-order optimization method specialized for deep neural networks (DNNs), ECCO-DNN.
This method models the optimization variable trajectory as a dynamical system and develops a discretization algorithm that adaptively selects step sizes based on the trajectory's shape.
arXiv Detail & Related papers (2023-10-21T03:45:13Z) - Performance Embeddings: A Similarity-based Approach to Automatic
Performance Optimization [71.69092462147292]
Performance embeddings enable knowledge transfer of performance tuning between applications.
We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils.
arXiv Detail & Related papers (2023-03-14T15:51:35Z) - VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles.
We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates.
We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z) - An Empirical Evaluation of Zeroth-Order Optimization Methods on
AI-driven Molecule Optimization [78.36413169647408]
We study the effectiveness of various ZO optimization methods for optimizing molecular objectives.
We show the advantages of ZO sign-based gradient descent (ZO-signGD)
We demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite.
arXiv Detail & Related papers (2022-10-27T01:58:10Z) - Moment Centralization based Gradient Descent Optimizers for
Convolutional Neural Networks [12.90962626557934]
Conal neural networks (CNNs) have shown very appealing performance for many computer vision applications.
In this paper, we propose a moment centralization-based SGD datasets for CNNs.
The proposed moment centralization is generic in nature and can be integrated with any of the existing adaptive momentum-baseds.
arXiv Detail & Related papers (2022-07-19T04:38:01Z) - Feasible Low-thrust Trajectory Identification via a Deep Neural Network
Classifier [1.5076964620370268]
This work proposes a deep neural network (DNN) to accurately identify feasible low thrust transfer prior to the optimization process.
The DNN-classifier achieves an overall accuracy of 97.9%, which has the best performance among the tested algorithms.
arXiv Detail & Related papers (2022-02-10T11:34:37Z) - I/O Lower Bounds for Auto-tuning of Convolutions in CNNs [2.571796445061562]
We develop a general I/O lower bound theory for a composite algorithm which consists of several different sub-computations.
We design the near I/O-optimal dataflow strategies for the two main convolution algorithms by fully exploiting the data reuse.
Experiment results show that our dataflow strategies with the auto-tuning approach can achieve about 3.32x performance speedup on average over cuDNN.
arXiv Detail & Related papers (2020-12-31T15:46:01Z) - Adaptive pruning-based optimization of parameterized quantum circuits [62.997667081978825]
Variisy hybrid quantum-classical algorithms are powerful tools to maximize the use of Noisy Intermediate Scale Quantum devices.
We propose a strategy for such ansatze used in variational quantum algorithms, which we call "Efficient Circuit Training" (PECT)
Instead of optimizing all of the ansatz parameters at once, PECT launches a sequence of variational algorithms.
arXiv Detail & Related papers (2020-10-01T18:14:11Z) - A Primer on Zeroth-Order Optimization in Signal Processing and Machine
Learning [95.85269649177336]
ZO optimization iteratively performs three major steps: gradient estimation, descent direction, and solution update.
We demonstrate promising applications of ZO optimization, such as evaluating and generating explanations from black-box deep learning models, and efficient online sensor management.
arXiv Detail & Related papers (2020-06-11T06:50:35Z) - Automated Design Space Exploration for optimised Deployment of DNN on
Arm Cortex-A CPUs [13.628734116014819]
Deep learning on embedded devices has prompted the development of numerous methods to optimise the deployment of deep neural networks (DNN)
There is a lack of research on cross-level optimisation as the space of approaches becomes too large to test and obtain a globally optimised solution.
We present a set of results for state-of-the-art DNNs on a range of Arm Cortex-A CPU platforms achieving up to 4x improvement in performance and over 2x reduction in memory.
arXiv Detail & Related papers (2020-06-09T11:00:06Z) - Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations.
Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization.
It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.