An Improvement for Capsule Networks using Depthwise Separable
Convolution
- URL: http://arxiv.org/abs/2007.15167v2
- Date: Tue, 19 Sep 2023 07:24:41 GMT
- Title: An Improvement for Capsule Networks using Depthwise Separable
Convolution
- Authors: Nguyen Huu Phong, Bernardete Ribeiro
- Abstract summary: Capsule Networks face a critical problem in computer vision in the sense that the image background can challenge its performance.
We propose to improve Capsule Networks' architecture by replacing the Standard Convolution with a Depthwise Separable Convolution.
New design significantly reduces the model's total parameters while increases stability and offers competitive accuracy.
- Score: 1.876462046907555
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Capsule Networks face a critical problem in computer vision in the sense that
the image background can challenge its performance, although they learn very
well on training data. In this work, we propose to improve Capsule Networks'
architecture by replacing the Standard Convolution with a Depthwise Separable
Convolution. This new design significantly reduces the model's total parameters
while increases stability and offers competitive accuracy. In addition, the
proposed model on $64\times64$ pixel images outperforms standard models on
$32\times32$ and $64\times64$ pixel images. Moreover, we empirically evaluate
these models with Deep Learning architectures using state-of-the-art Transfer
Learning networks such as Inception V3 and MobileNet V1. The results show that
Capsule Networks can perform comparably against Deep Learning models. To the
best of our knowledge, we believe that this is the first work on the
integration of Depthwise Separable Convolution into Capsule Networks.
Related papers
- SINET: Sparsity-driven Interpretable Neural Network for Underwater Image Enhancement [9.671347245207121]
This work introduces a sparsity-driven interpretable neural network (SINET) for the underwater image enhancement (UIE) task.
Unlike pure deep learning methods, our network architecture is based on a novel channel-specific convolutional sparse coding (CCSC) model.
Our experiments show that SINET surpasses state-of-the-art PSNR value by $1.05$ dB with $3873$ times lower computational complexity.
arXiv Detail & Related papers (2024-09-02T08:03:02Z) - HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation [106.09886920774002]
We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network.
Our method achieves consistent improvements over the baseline trained from scratch and significantly out- performs the existing schemes.
arXiv Detail & Related papers (2024-03-18T14:18:08Z) - Masked Capsule Autoencoders [5.363623643280699]
We propose Masked Capsule Autoencoders (MCAE), the first Capsule Network that utilises pretraining in a self-supervised manner.
Our proposed MCAE model alleviates this issue by reformulating the Capsule Network to use masked image modelling as a pretraining stage.
We demonstrate that similarly to CNNs and ViTs, Capsule Networks can also benefit from self-supervised pretraining.
arXiv Detail & Related papers (2024-03-07T18:22:03Z) - Revisiting Adversarial Training for ImageNet: Architectures, Training
and Generalization across Threat Models [52.86163536826919]
We revisit adversarial training on ImageNet comparing ViTs and ConvNeXts.
Our modified ConvNeXt, ConvNeXt + ConvStem, yields the most robust generalizations across different ranges of model parameters.
Our ViT + ConvStem yields the best generalization to unseen threat models.
arXiv Detail & Related papers (2023-03-03T11:53:01Z) - A Light-weight Deep Learning Model for Remote Sensing Image
Classification [70.66164876551674]
We present a high-performance and light-weight deep learning model for Remote Sensing Image Classification (RSIC)
By conducting extensive experiments on the NWPU-RESISC45 benchmark, our proposed teacher-student models outperforms the state-of-the-art systems.
arXiv Detail & Related papers (2023-02-25T09:02:01Z) - ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders [104.05133094625137]
We propose a fully convolutional masked autoencoder framework and a new Global Response Normalization layer.
This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets.
arXiv Detail & Related papers (2023-01-02T18:59:31Z) - Capsule Network based Contrastive Learning of Unsupervised Visual
Representations [13.592112044121683]
Contrastive Capsule (CoCa) Model is a Siamese style Capsule Network using Contrastive loss with our novel architecture, training and testing algorithm.
We evaluate the model on unsupervised image classification CIFAR-10 dataset and achieve a top-1 test accuracy of 70.50% and top-5 test accuracy of 98.10%.
Due to our efficient architecture our model has 31 times less parameters and 71 times less FLOPs than the current SOTA in both supervised and unsupervised learning.
arXiv Detail & Related papers (2022-09-22T19:05:27Z) - P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with
Point-to-Pixel Prompting [94.11915008006483]
We propose a novel Point-to-Pixel prompting for point cloud analysis.
Our method attains 89.3% accuracy on the hardest setting of ScanObjectNN.
Our framework also exhibits very competitive performance on ModelNet classification and ShapeNet Part Code.
arXiv Detail & Related papers (2022-08-04T17:59:03Z) - 3DConvCaps: 3DUnet with Convolutional Capsule Encoder for Medical Image
Segmentation [1.863532786702135]
We propose a 3D encoder-decoder network with Convolutional Capsule (called 3DConvCaps) to learn lower-level features (short-range attention) with convolutional layers.
Our experiments on multiple datasets including iSeg-2017, Hippocampus, and Cardiac demonstrate that our 3D 3DConvCaps network considerably outperforms previous capsule networks and 3D-UNets.
arXiv Detail & Related papers (2022-05-19T03:00:04Z) - Improved Residual Networks for Image and Video Recognition [98.10703825716142]
Residual networks (ResNets) represent a powerful type of convolutional neural network (CNN) architecture.
We show consistent improvements in accuracy and learning convergence over the baseline.
Our proposed approach allows us to train extremely deep networks, while the baseline shows severe optimization issues.
arXiv Detail & Related papers (2020-04-10T11:09:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.