Shifting Capsule Networks from the Cloud to the Deep Edge
- URL: http://arxiv.org/abs/2110.02911v1
- Date: Wed, 6 Oct 2021 16:52:01 GMT
- Title: Shifting Capsule Networks from the Cloud to the Deep Edge
- Authors: Miguel Costa, Diogo Costa, Tiago Gomes, Sandro Pinto
- Abstract summary: We present an API for the execution of quantized CapsNets in Cortex-M and RISC-V MCUs.
Results show a reduction in memory footprint of almost 75%, with a maximum accuracy loss of 1%.
In terms of throughput, our software kernels for the Arm Cortex-M are, at least, 5.70x faster than a pre-quantized CapsNet running on an NVIDIA GTX 980 Ti graphics card.
- Score: 0.9712140341805068
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Capsule networks (CapsNets) are an emerging trend in image processing. In
contrast to a convolutional neural network, CapsNets are not vulnerable to
object deformation, as the relative spatial information of the objects is
preserved across the network. However, their complexity is mainly related with
the capsule structure and the dynamic routing mechanism, which makes it almost
unreasonable to deploy a CapsNet, in its original form, in a
resource-constrained device powered by a small microcontroller (MCU). In an era
where intelligence is rapidly shifting from the cloud to the edge, this high
complexity imposes serious challenges to the adoption of CapsNets at the very
edge. To tackle this issue, we present an API for the execution of quantized
CapsNets in Cortex-M and RISC-V MCUs. Our software kernels extend the Arm
CMSIS-NN and RISC-V PULP-NN, to support capsule operations with 8-bit integers
as operands. Along with it, we propose a framework to perform post training
quantization of a CapsNet. Results show a reduction in memory footprint of
almost 75%, with a maximum accuracy loss of 1%. In terms of throughput, our
software kernels for the Arm Cortex-M are, at least, 5.70x faster than a
pre-quantized CapsNet running on an NVIDIA GTX 980 Ti graphics card. For
RISC-V, the throughout gain increases to 26.28x and 56.91x for a single- and
octa-core configuration, respectively.
Related papers
- UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition [61.01408259741114]
We propose four architectural guidelines for designing large- Kernel-based convolutional neural networks (ConvNets)
Our proposed large- Kernel-based ConvNet shows leading performance in image recognition.
We discover large kernels are the key to unlocking the exceptional performance of ConvNets in domains where they were originally not proficient.
arXiv Detail & Related papers (2023-11-27T07:48:50Z) - PDR-CapsNet: an Energy-Efficient Parallel Approach to Dynamic Routing in
Capsule Networks [0.27195102129095]
Convolutional Neural Networks (CNNs) have produced state-of-the-art results for image classification tasks.
CapsNets, however, often fall short on complex datasets and require more computational resources than CNNs.
We introduce the Parallel Dynamic Routing CapsNet (PDR-CapsNet), a deeper and more energy-efficient alternative to CapsNet.
We achieve 83.55% accuracy while requiring 87.26% fewer parameters, 32.27% and 47.40% fewer MACs, and Flops.
arXiv Detail & Related papers (2023-10-04T23:38:09Z) - InceptionNeXt: When Inception Meets ConvNeXt [167.61042926444105]
We build a series of networks, namely IncepitonNeXt, which not only enjoy high throughputs but also maintain competitive performance.
InceptionNeXt achieves 1.6x higher training throughputs than ConvNeX-T, as well as attains 0.2% top-1 accuracy improvement on ImageNet-1K.
arXiv Detail & Related papers (2023-03-29T17:59:58Z) - MogaNet: Multi-order Gated Aggregation Network [64.16774341908365]
We propose a new family of modern ConvNets, dubbed MogaNet, for discriminative visual representation learning.
MogaNet encapsulates conceptually simple yet effective convolutions and gated aggregation into a compact module.
MogaNet exhibits great scalability, impressive efficiency of parameters, and competitive performance compared to state-of-the-art ViTs and ConvNets on ImageNet.
arXiv Detail & Related papers (2022-11-07T04:31:17Z) - More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using
Sparsity [103.62784587778037]
Recently, a couple of advanced convolutional models strike back with large kernels motivated by the local but large attention mechanism.
We propose Sparse Large Kernel Network (SLaK), a pure CNN architecture equipped with 51x51 kernels that can perform on par with or better than state-of-the-art hierarchical Transformers.
arXiv Detail & Related papers (2022-07-07T23:55:52Z) - Momentum Capsule Networks [0.8594140167290097]
We propose a new network architecture, called Momentum Capsule Network (MoCapsNet)
MoCapsNet is inspired by Momentum ResNets, a type of network that applies residual building blocks.
We show that MoCapsNet beats the accuracy of baseline capsule networks on MNIST, SVHN and CIFAR-10 while using considerably less memory.
arXiv Detail & Related papers (2022-01-26T17:53:18Z) - MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs.
We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory.
We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z) - Parallel Capsule Networks for Classification of White Blood Cells [1.5749416770494706]
Capsule Networks (CapsNets) is a machine learning architecture proposed to overcome some of the shortcomings of convolutional neural networks (CNNs)
We present a new architecture, parallel CapsNets, which exploits the concept of branching the network to isolate certain capsules.
arXiv Detail & Related papers (2021-08-05T14:30:44Z) - Leveraging Automated Mixed-Low-Precision Quantization for tiny edge
microcontrollers [76.30674794049293]
This paper presents an automated mixed-precision quantization flow based on the HAQ framework but tailored for the memory and computational characteristics of MCU devices.
Specifically, a Reinforcement Learning agent searches for the best uniform quantization levels, among 2, 4, 8 bits, of individual weight and activation tensors.
Given an MCU-class memory bound to 2MB for weight-only quantization, the compressed models produced by the mixed-precision engine result as accurate as the state-of-the-art solutions.
arXiv Detail & Related papers (2020-08-12T06:09:58Z) - Q-CapsNets: A Specialized Framework for Quantizing Capsule Networks [12.022910298030219]
Capsule Networks (CapsNets) have superior learning capabilities in machine learning tasks, like image classification, compared to the traditional CNNs.
CapsNets require extremely intense computations and are difficult to be deployed in their original form at the resource-constrained edge devices.
This paper makes the first attempt to quantize CapsNet models, to enable their efficient edge implementations, by developing a specialized quantization framework for CapsNets.
arXiv Detail & Related papers (2020-04-15T14:32:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.