Related papers: TASO: Time and Space Optimization for Memory-Constrained DNN Inference

TASO: Time and Space Optimization for Memory-Constrained DNN Inference

URL: http://arxiv.org/abs/2005.10709v1
Date: Thu, 21 May 2020 15:08:06 GMT
Title: TASO: Time and Space Optimization for Memory-Constrained DNN Inference
Authors: Yuan Wen, Andrew Anderson, Valentin Radu, Michael F.P. O'Boyle, David Gregg
Abstract summary: Convolutional neural networks (CNNs) are used in many embedded applications, from industrial robotics and automation systems to biometric identification on mobile devices. We propose an approach for ahead-of-time domain specific optimization of CNN models, based on an integer linear programming (ILP) for selecting primitive operations to implement convolutional layers.
Score: 5.023660118588569
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Convolutional neural networks (CNNs) are used in many embedded applications, from industrial robotics and automation systems to biometric identification on mobile devices. State-of-the-art classification is typically achieved by large networks, which are prohibitively expensive to run on mobile and embedded devices with tightly constrained memory and energy budgets. We propose an approach for ahead-of-time domain specific optimization of CNN models, based on an integer linear programming (ILP) for selecting primitive operations to implement convolutional layers. We optimize the trade-off between execution time and memory consumption by: 1) attempting to minimize execution time across the whole network by selecting data layouts and primitive operations to implement each layer; and 2) allocating an appropriate workspace that reflects the upper bound of memory footprint per layer. These two optimization strategies can be used to run any CNN on any platform with a C compiler. Our evaluation with a range of popular ImageNet neural architectures (GoogleNet, AlexNet, VGG, ResNet and SqueezeNet) on the ARM Cortex-A15 yields speedups of 8x compared to a greedy algorithm based primitive selection, reduces memory requirement by 2.2x while sacrificing only 15% of inference time compared to a solver that considers inference time only. In addition, our optimization approach exposes a range of optimal points for different configurations across the Pareto frontier of memory and latency trade-off, which can be used under arbitrary system constraints.

Related papers

Optimal Gradient Checkpointing for Sparse and Recurrent Architectures using Off-Chip Memory [0.8321953606016751]
We introduce memory-efficient gradient checkpointing strategies tailored for the general class of sparse RNNs and Spiking Neural Networks. We find that Double Checkpointing emerges as the most effective method, optimizing the use of local memory resources while minimizing recomputation overhead.
arXiv Detail & Related papers (2024-12-16T14:23:31Z)
Enhancing MOTION2NX for Efficient, Scalable and Secure Image Inference using Convolutional Neural Networks [4.407841002228536]
We use the ABY2.0 SMPC protocol implemented on the C++ based MOTION2NX framework for secure convolutional neural network (CNN) inference application with semi-honest security. We also present a novel splitting algorithm that divides the computations at each CNN layer into multiple chunks.
arXiv Detail & Related papers (2024-08-29T09:50:21Z)
Memory-aware Scheduling for Complex Wired Networks with Iterative Graph Optimization [4.614780125575351]
We propose an efficient memory-aware scheduling framework based on iterative graph optimization. Our framework features an iterative graph fusion algorithm that simplifies the graph while preserving the scheduling optimality.
arXiv Detail & Related papers (2023-08-26T14:52:02Z)
An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks [0.0]
Ensembles of Deep Neural Networks (DNNs) have achieved qualitative predictions but they are computing and memory intensive. We propose a new software layer to serve with flexibility and efficiency ensembles of DNNs.
arXiv Detail & Related papers (2022-08-30T08:05:43Z)
Towards Optimal VPU Compiler Cost Modeling by using Neural Networks to Infer Hardware Performances [58.720142291102135]
'VPUNN' is a neural network-based cost model trained on low-level task profiling. It consistently outperforms the state-of-the-art cost modeling in Intel's line of VPU processors.
arXiv Detail & Related papers (2022-05-09T22:48:39Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations. We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks. specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples. We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z)
Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data. In this paper, we present and evaluate different strategies for the binarization of graph neural networks. We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z)
Optimising the Performance of Convolutional Neural Networks across Computing Systems using Transfer Learning [0.08594140167290096]
We propose to replace a lengthy profiling stage with a machine learning based approach of performance modeling. After training, our performance model can estimate the performance of convolutional primitives in any layer configuration. The time to optimise the execution of large neural networks via primitive selection is reduced from hours to just seconds.
arXiv Detail & Related papers (2020-10-20T20:58:27Z)
Pairwise Neural Networks (PairNets) with Low Memory for Fast On-Device Applications [0.0]
A traditional artificial neural network (ANN) is normally trained slowly by a gradient descent algorithm, such as the backpropagation algorithm. We created a novel wide and shallow 4-layer ANN called "Pairwise Neural Network" ("PairNet") with high-speed non-gradient-descent hyper parameter optimization.
arXiv Detail & Related papers (2020-02-10T02:12:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.