XnODR and XnIDR: Two Accurate and Fast Fully Connected Layers For
Convolutional Neural Networks
- URL: http://arxiv.org/abs/2111.10854v3
- Date: Wed, 20 Sep 2023 01:12:51 GMT
- Title: XnODR and XnIDR: Two Accurate and Fast Fully Connected Layers For
Convolutional Neural Networks
- Authors: Jian Sun, Ali Pourramezan Fard, and Mohammad H. Mahoor
- Abstract summary: Capsule Network is powerful at defining the positional relationship between features in deep neural networks for visual recognition tasks.
The bottleneck is in the computational complexity of the Dynamic Routing mechanism used between the capsules.
XnODR and XnIDR help networks to have high accuracy with lower FLOPs and fewer parameters.
- Score: 43.85390451313721
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Capsule Network is powerful at defining the positional relationship between
features in deep neural networks for visual recognition tasks, but it is
computationally expensive and not suitable for running on mobile devices. The
bottleneck is in the computational complexity of the Dynamic Routing mechanism
used between the capsules. On the other hand, XNOR-Net is fast and
computationally efficient, though it suffers from low accuracy due to
information loss in the binarization process. To address the computational
burdens of the Dynamic Routing mechanism, this paper proposes new Fully
Connected (FC) layers by xnorizing the linear projection outside or inside the
Dynamic Routing within the CapsFC layer. Specifically, our proposed FC layers
have two versions, XnODR (Xnorize the Linear Projection Outside Dynamic
Routing) and XnIDR (Xnorize the Linear Projection Inside Dynamic Routing). To
test the generalization of both XnODR and XnIDR, we insert them into two
different networks, MobileNetV2 and ResNet-50. Our experiments on three
datasets, MNIST, CIFAR-10, and MultiMNIST validate their effectiveness. The
results demonstrate that both XnODR and XnIDR help networks to have high
accuracy with lower FLOPs and fewer parameters (e.g., 96.14% correctness with
2.99M parameters and 311.74M FLOPs on CIFAR-10).
Related papers
- NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions [2.7086888205833968]
Field-Programmable Gate Array (FPGA) accelerators have proven successful in handling latency- and resource-critical deep neural network (DNN) inference tasks.
We propose relaxing the boundaries of neurons and mapping entire sub-networks to a single LUT.
We validate our proposed method on a known latency-critical task, jet substructure tagging, and on the classical computer vision task, digit classification using MNIST.
arXiv Detail & Related papers (2024-02-29T16:10:21Z) - TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic
Token Mixer for Visual Recognition [71.6546914957701]
We propose a lightweight Dual Dynamic Token Mixer (D-Mixer) that aggregates global information and local details in an input-dependent way.
We use D-Mixer as the basic building block to design TransXNet, a novel hybrid CNN-Transformer vision backbone network.
In the ImageNet-1K image classification task, TransXNet-T surpasses Swin-T by 0.3% in top-1 accuracy while requiring less than half of the computational cost.
arXiv Detail & Related papers (2023-10-30T09:35:56Z) - Mixed-TD: Efficient Neural Network Accelerator with Layer-Specific
Tensor Decomposition [7.221206118679026]
We propose a framework for mapping CNNs onto FPGAs based on a novel tensor decomposition method called Mixed-TD.
The proposed method applies layer-specific Singular Value Decomposition (SVD) and Canonical Polyadic Decomposition (CPD) in a mixed manner, achieving 1.73x to 10.29x throughput per DSP to state-of-the-art CNNs.
arXiv Detail & Related papers (2023-06-08T08:16:38Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - Dynamic Slimmable Network [105.74546828182834]
We develop a dynamic network slimming regime named Dynamic Slimmable Network (DS-Net)
Our DS-Net is empowered with the ability of dynamic inference by the proposed double-headed dynamic gate.
It consistently outperforms its static counterparts as well as state-of-the-art static and dynamic model compression methods.
arXiv Detail & Related papers (2021-03-24T15:25:20Z) - Learning Sparse & Ternary Neural Networks with Entropy-Constrained
Trained Ternarization (EC2T) [17.13246260883765]
Deep neural networks (DNNs) have shown remarkable success in a variety of machine learning applications.
In recent years, there is an increasing interest in deploying DNNs to resource-constrained devices with limited energy, memory, and computational budget.
We propose Entropy-Constrained Trained Ternarization (EC2T), a general framework to create sparse and ternary neural networks.
arXiv Detail & Related papers (2020-04-02T15:38:00Z) - DHP: Differentiable Meta Pruning via HyperNetworks [158.69345612783198]
This paper introduces a differentiable pruning method via hypernetworks for automatic network pruning.
Latent vectors control the output channels of the convolutional layers in the backbone network and act as a handle for the pruning of the layers.
Experiments are conducted on various networks for image classification, single image super-resolution, and denoising.
arXiv Detail & Related papers (2020-03-30T17:59:18Z) - Convolutional Networks with Dense Connectivity [59.30634544498946]
We introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion.
For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers.
We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks.
arXiv Detail & Related papers (2020-01-08T06:54:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.