Related papers: No Routing Needed Between Capsules

No Routing Needed Between Capsules

URL: http://arxiv.org/abs/2001.09136v6
Date: Thu, 17 Jun 2021 20:14:13 GMT
Title: No Routing Needed Between Capsules
Authors: Adam Byerly, Tatiana Kalganova, Ian Dear
Abstract summary: Homogeneous Vector Capsules (HVCs) use element-wise multiplication rather than matrix multiplication. We show that a simple convolutional neural network using HVCs performs as well as the prior best performing capsule network on MNIST.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most capsule network designs rely on traditional matrix multiplication between capsule layers and computationally expensive routing mechanisms to deal with the capsule dimensional entanglement that the matrix multiplication introduces. By using Homogeneous Vector Capsules (HVCs), which use element-wise multiplication rather than matrix multiplication, the dimensions of the capsules remain unentangled. In this work, we study HVCs as applied to the highly structured MNIST dataset in order to produce a direct comparison to the capsule research direction of Geoffrey Hinton, et al. In our study, we show that a simple convolutional neural network using HVCs performs as well as the prior best performing capsule network on MNIST using 5.5x fewer parameters, 4x fewer training epochs, no reconstruction sub-network, and requiring no routing mechanism. The addition of multiple classification branches to the network establishes a new state of the art for the MNIST dataset with an accuracy of 99.87% for an ensemble of these models, as well as establishing a new state of the art for a single model (99.83% accurate).

Related papers

Deep multi-prototype capsule networks [0.3823356975862005]
Capsule networks are a type of neural network that identify image parts and form the instantiation parameters of a whole hierarchically. This paper presents a multi-prototype architecture for guiding capsule networks to represent the variations in the image parts. The experimental results on MNIST, SVHN, C-Cube, CEDAR, MCYT, and UTSig datasets reveal that the proposed model outperforms others regarding image classification accuracy.
arXiv Detail & Related papers (2024-04-23T18:37:37Z)
OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and Pruning [21.5857226735951]
Redundancy is a persistent challenge in Capsule Networks (CapsNet) We propose an Orthogonal Capsule Network (OrthCaps) to reduce redundancy, improve routing performance and decrease parameter counts.
arXiv Detail & Related papers (2024-03-20T07:25:24Z)
Parameter-Efficient Masking Networks [61.43995077575439]
Advanced network designs often contain a large number of repetitive structures (e.g., Transformer) In this study, we are the first to investigate the representative potential of fixed random weights with limited unique values by learning masks. It leads to a new paradigm for model compression to diminish the model size.
arXiv Detail & Related papers (2022-10-13T03:39:03Z)
Compact representations of convolutional neural networks via weight pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization. We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z)
Deformable Capsules for Object Detection [3.702343116848637]
We introduce a new family of capsule networks, deformable capsules (textitDeformCaps), to address a very important problem in computer vision: object detection. We demonstrate that the proposed methods efficiently scale up to create the first-ever capsule network for object detection in the literature.
arXiv Detail & Related papers (2021-04-11T15:36:30Z)
A Deeper Look into Convolutions via Pruning [9.89901717499058]
Modern architectures contain a very small number of fully-connected layers, often at the end, after multiple layers of convolutions. Although this strategy already reduces the number of parameters, most of the convolutions can be eliminated as well, without suffering any loss in recognition performance. In this work, we use the matrix characteristics based on eigenvalues in addition to the classical weight-based importance assignment approach for pruning to shed light on the internal mechanisms of a widely used family of CNNs.
arXiv Detail & Related papers (2021-02-04T18:55:03Z)
The Heterogeneity Hypothesis: Finding Layer-Wise Differentiated Network Architectures [179.66117325866585]
We investigate a design space that is usually overlooked, i.e. adjusting the channel configurations of predefined networks. We find that this adjustment can be achieved by shrinking widened baseline networks and leads to superior performance. Experiments are conducted on various networks and datasets for image classification, visual tracking and image restoration.
arXiv Detail & Related papers (2020-06-29T17:59:26Z)
A Systematic Approach to Featurization for Cancer Drug Sensitivity Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques. We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
When Residual Learning Meets Dense Aggregation: Rethinking the Aggregation of Deep Neural Networks [57.0502745301132]
We propose Micro-Dense Nets, a novel architecture with global residual learning and local micro-dense aggregations. Our micro-dense block can be integrated with neural architecture search based models to boost their performance.
arXiv Detail & Related papers (2020-04-19T08:34:52Z)
Subspace Capsule Network [85.69796543499021]
SubSpace Capsule Network (SCN) exploits the idea of capsule networks to model possible variations in the appearance or implicitly defined properties of an entity. SCN can be applied to both discriminative and generative models without incurring computational overhead compared to CNN during test time.
arXiv Detail & Related papers (2020-02-07T17:51:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.