Searching for Winograd-aware Quantized Networks
- URL: http://arxiv.org/abs/2002.10711v1
- Date: Tue, 25 Feb 2020 07:53:53 GMT
- Title: Searching for Winograd-aware Quantized Networks
- Authors: Javier Fernandez-Marques, Paul N. Whatmough, Andrew Mundy, Matthew
Mattina
- Abstract summary: We propose a Winograd-aware formulation of convolution layers which exposes the numerical inaccuracies introduced by the Winograd transformations.
We also address the source of the numerical error and propose a relaxation on the form of the transformation matrices, resulting in up to 10% higher classification accuracy on CIFAR-10.
- Score: 12.351250944079949
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lightweight architectural designs of Convolutional Neural Networks (CNNs)
together with quantization have paved the way for the deployment of demanding
computer vision applications on mobile devices. Parallel to this, alternative
formulations to the convolution operation such as FFT, Strassen and Winograd,
have been adapted for use in CNNs offering further speedups. Winograd
convolutions are the fastest known algorithm for spatially small convolutions,
but exploiting their full potential comes with the burden of numerical error,
rendering them unusable in quantized contexts. In this work we propose a
Winograd-aware formulation of convolution layers which exposes the numerical
inaccuracies introduced by the Winograd transformations to the learning of the
model parameters, enabling the design of competitive quantized models without
impacting model size. We also address the source of the numerical error and
propose a relaxation on the form of the transformation matrices, resulting in
up to 10% higher classification accuracy on CIFAR-10. Finally, we propose
wiNAS, a neural architecture search (NAS) framework that jointly optimizes a
given macro-architecture for accuracy and latency leveraging Winograd-aware
layers. A Winograd-aware ResNet-18 optimized with wiNAS for CIFAR-10 results in
2.66x speedup compared to im2row, one of the most widely used optimized
convolution implementations, with no loss in accuracy.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs.
We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z) - Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like
Architectures [99.20299078655376]
This paper introduces Vision-RWKV, a model adapted from the RWKV model used in the NLP field.
Our model is designed to efficiently handle sparse inputs and demonstrate robust global processing capabilities.
Our evaluations demonstrate that VRWKV surpasses ViT's performance in image classification and has significantly faster speeds and lower memory usage.
arXiv Detail & Related papers (2024-03-04T18:46:20Z) - Tetra-AML: Automatic Machine Learning via Tensor Networks [0.0]
We introduce the Tetra-AML toolbox, which automates neural architecture search and hyperparameter optimization.
The toolbox also provides model compression through quantization and pruning, augmented by compression using tensor networks.
Here, we analyze a unified benchmark for optimizing neural networks in computer vision tasks and show the superior performance of our approach.
arXiv Detail & Related papers (2023-03-28T12:56:54Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time
Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations.
We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z) - INT8 Winograd Acceleration for Conv1D Equipped ASR Models Deployed on
Mobile Devices [16.13681155725083]
The intensive computation of Automatic Speech Recognition (ASR) models obstructs them from being deployed on mobile devices.
We present a novel quantized Winograd optimization pipeline, which combines the quantization and fast convolution to achieve efficient inference acceleration on mobile devices for ASR models.
arXiv Detail & Related papers (2020-10-28T09:25:49Z) - CNN Acceleration by Low-rank Approximation with Quantized Factors [9.654865591431593]
The modern convolutional neural networks although achieve great results in solving complex computer vision tasks still cannot be effectively used in mobile and embedded devices.
In order to solve this problem the novel approach combining two known methods, low-rank tensor approximation in Tucker format and quantization of weights and feature maps (activations) is proposed.
The efficiency of our method is demonstrated for ResNet18 and ResNet34 on CIFAR-10, CIFAR-100 and Imagenet classification tasks.
arXiv Detail & Related papers (2020-06-16T02:28:05Z) - LANCE: Efficient Low-Precision Quantized Winograd Convolution for Neural
Networks Based on Graphics Processing Units [6.110973485878557]
We propose an efficient low-precision quantized Winograd convolution algorithm, called LANCE, which combines the advantages of fast convolution and quantization techniques.
We show that our 8-bit quantized Winograd convolution improves the performance by up to 2.40x over the full-precision convolution with trivial accuracy loss.
arXiv Detail & Related papers (2020-03-19T09:46:50Z) - Lightweight Residual Densely Connected Convolutional Neural Network [18.310331378001397]
The lightweight residual densely connected blocks are proposed to guaranty the deep supervision, efficient gradient flow, and feature reuse abilities of convolutional neural network.
The proposed method decreases the cost of training and inference processes without using any special hardware-software equipment.
arXiv Detail & Related papers (2020-01-02T17:15:32Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.