Related papers: Automated Design Space Exploration for optimised Deployment of DNN on Arm Cortex-A CPUs

Automated Design Space Exploration for optimised Deployment of DNN on Arm Cortex-A CPUs

URL: http://arxiv.org/abs/2006.05181v2
Date: Tue, 15 Dec 2020 19:30:11 GMT
Title: Automated Design Space Exploration for optimised Deployment of DNN on Arm Cortex-A CPUs
Authors: Miguel de Prado, Andrew Mundy, Rabia Saeed, Maurizio Denna, Nuria Pazos and Luca Benini
Abstract summary: Deep learning on embedded devices has prompted the development of numerous methods to optimise the deployment of deep neural networks (DNN) There is a lack of research on cross-level optimisation as the space of approaches becomes too large to test and obtain a globally optimised solution. We present a set of results for state-of-the-art DNNs on a range of Arm Cortex-A CPU platforms achieving up to 4x improvement in performance and over 2x reduction in memory.
Score: 13.628734116014819
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The spread of deep learning on embedded devices has prompted the development of numerous methods to optimise the deployment of deep neural networks (DNN). Works have mainly focused on: i) efficient DNN architectures, ii) network optimisation techniques such as pruning and quantisation, iii) optimised algorithms to speed up the execution of the most computational intensive layers and, iv) dedicated hardware to accelerate the data flow and computation. However, there is a lack of research on cross-level optimisation as the space of approaches becomes too large to test and obtain a globally optimised solution. Thus, leading to suboptimal deployment in terms of latency, accuracy, and memory. In this work, we first detail and analyse the methods to improve the deployment of DNNs across the different levels of software optimisation. Building on this knowledge, we present an automated exploration framework to ease the deployment of DNNs. The framework relies on a Reinforcement Learning search that, combined with a deep learning inference framework, automatically explores the design space and learns an optimised solution that speeds up the performance and reduces the memory on embedded CPU platforms. Thus, we present a set of results for state-of-the-art DNNs on a range of Arm Cortex-A CPU platforms achieving up to 4x improvement in performance and over 2x reduction in memory with negligible loss in accuracy with respect to the BLAS floating-point implementation.

Related papers

MetaML-Pro: Cross-Stage Design Flow Automation for Efficient Deep Learning Acceleration [8.43012094714496]
This paper presents a unified framework for codifying and automating optimization strategies to deploy deep neural networks (DNNs) on resource-constrained hardware. Our novel approach addresses two key issues: cross-stage co-optimization and optimization search. Experimental results demonstrate up to a 92% DSP and 89% LUT usage reduction for select networks.
arXiv Detail & Related papers (2025-02-09T11:02:06Z)
AIRCHITECT v2: Learning the Hardware Accelerator Design Space through Unified Representations [3.6231171463908938]
Design space exploration plays a crucial role in enabling custom hardware architectures. Recently, AIrchitect v1, the first attempt to address the limitations of DSE into a search-time classification problem.
arXiv Detail & Related papers (2025-01-17T04:57:42Z)
Combining Neural Architecture Search and Automatic Code Optimization: A Survey [0.8796261172196743]
Two notable techniques are Hardware-aware Neural Architecture Search (HW-NAS) and Automatic Code Optimization (ACO) HW-NAS automatically designs accurate yet hardware-friendly neural networks, while ACO involves searching for the best compiler optimizations to apply on neural networks. This survey explores recent works that combine these two techniques within a single framework.
arXiv Detail & Related papers (2024-08-07T22:40:05Z)
Spyx: A Library for Just-In-Time Compiled Optimization of Spiking Neural Networks [0.08965418284317034]
Spiking Neural Networks (SNNs) offer to enhance energy efficiency through a reduced and low-power hardware footprint. This paper introduces Spyx, a new and lightweight SNN simulation and optimization library designed in JAX.
arXiv Detail & Related papers (2024-02-29T09:46:44Z)
Flexible Channel Dimensions for Differentiable Architecture Search [50.33956216274694]
We propose a novel differentiable neural architecture search method with an efficient dynamic channel allocation algorithm. We show that the proposed framework is able to find DNN architectures that are equivalent to previous methods in task accuracy and inference latency.
arXiv Detail & Related papers (2023-06-13T15:21:38Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks. specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples. We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z)
HAO: Hardware-aware neural Architecture Optimization for Efficient Inference [25.265181492143107]
We develop an integer programming algorithm to prune the design space of a neural network search algorithm. Our algorithm achieves 72.5% top-1 accuracy on ImageNet at framerate 50, which is 60% faster than MnasNet and 135% faster than FBNet with comparable accuracy.
arXiv Detail & Related papers (2021-04-26T17:59:29Z)
HSCoNAS: Hardware-Software Co-Design of Efficient DNNs via Neural Architecture Search [6.522258468923919]
We present a novel hardware-aware neural architecture search (NAS) framework, namely HSCoNAS, to automate the design of deep neural networks (DNNs) To accomplish this goal, we first propose an effective hardware performance modeling method to approximate the runtime latency of DNNs on target hardware. We also propose two novel techniques, i.e., dynamic channel scaling to maximize the accuracy under the specified latency and progressive space shrinking to refine the search space towards target hardware.
arXiv Detail & Related papers (2021-03-11T12:21:21Z)
Analytical Characterization and Design Space Exploration for Optimization of CNNs [10.15406080228806]
Loop-level optimization, including loop tiling and loop permutation, are fundamental transformations to reduce data movement. This paper develops an analytical modeling approach for finding the best loop-level optimization configuration for CNNs on multi-core CPUs.
arXiv Detail & Related papers (2021-01-24T21:36:52Z)
Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations. Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization. It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z)
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.