Masked Capsule Autoencoders
- URL: http://arxiv.org/abs/2403.04724v1
- Date: Thu, 7 Mar 2024 18:22:03 GMT
- Title: Masked Capsule Autoencoders
- Authors: Miles Everett, Mingjun Zhong, and Georgios Leontidis
- Abstract summary: We propose Masked Capsule Autoencoders (MCAE), the first Capsule Network that utilises pretraining in a self-supervised manner.
Our proposed MCAE model alleviates this issue by reformulating the Capsule Network to use masked image modelling as a pretraining stage.
We demonstrate that similarly to CNNs and ViTs, Capsule Networks can also benefit from self-supervised pretraining.
- Score: 5.363623643280699
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose Masked Capsule Autoencoders (MCAE), the first Capsule Network that
utilises pretraining in a self-supervised manner. Capsule Networks have emerged
as a powerful alternative to Convolutional Neural Networks (CNNs), and have
shown favourable properties when compared to Vision Transformers (ViT), but
have struggled to effectively learn when presented with more complex data,
leading to Capsule Network models that do not scale to modern tasks. Our
proposed MCAE model alleviates this issue by reformulating the Capsule Network
to use masked image modelling as a pretraining stage before finetuning in a
supervised manner. Across several experiments and ablations studies we
demonstrate that similarly to CNNs and ViTs, Capsule Networks can also benefit
from self-supervised pretraining, paving the way for further advancements in
this neural network domain. For instance, pretraining on the Imagenette
dataset, a dataset of 10 classes of Imagenet-sized images, we achieve not only
state-of-the-art results for Capsule Networks but also a 9% improvement
compared to purely supervised training. Thus we propose that Capsule Networks
benefit from and should be trained within a masked image modelling framework,
with a novel capsule decoder, to improve a Capsule Network's performance on
realistic-sized images.
Related papers
- Stitched ViTs are Flexible Vision Backbones [51.441023711924835]
We are inspired by stitchable neural networks (SN-Net) to produce a single model that covers richworks by stitching pretrained model families.
We introduce SN-Netv2, a systematically improved model stitching framework to facilitate downstream task adaptation.
SN-Netv2 demonstrates superior performance over SN-Netv1 on downstream dense predictions and shows strong ability as a flexible vision backbone.
arXiv Detail & Related papers (2023-06-30T22:05:34Z) - Capsule Network based Contrastive Learning of Unsupervised Visual
Representations [13.592112044121683]
Contrastive Capsule (CoCa) Model is a Siamese style Capsule Network using Contrastive loss with our novel architecture, training and testing algorithm.
We evaluate the model on unsupervised image classification CIFAR-10 dataset and achieve a top-1 test accuracy of 70.50% and top-5 test accuracy of 98.10%.
Due to our efficient architecture our model has 31 times less parameters and 71 times less FLOPs than the current SOTA in both supervised and unsupervised learning.
arXiv Detail & Related papers (2022-09-22T19:05:27Z) - Towards Efficient Capsule Networks [7.1577508803778045]
Capsule Networks were introduced to enhance explainability of a model, where each capsule is an explicit representation of an object or its parts.
We show how pruning with Capsule Network achieves high generalization with less memory requirements, computational effort, and inference and training time.
arXiv Detail & Related papers (2022-08-19T08:03:25Z) - Masked Autoencoders are Robust Data Augmentors [90.34825840657774]
Regularization techniques like image augmentation are necessary for deep neural networks to generalize well.
We propose a novel perspective of augmentation to regularize the training process.
We show that utilizing such model-based nonlinear transformation as data augmentation can improve high-level recognition tasks.
arXiv Detail & Related papers (2022-06-10T02:41:48Z) - SS-3DCapsNet: Self-supervised 3D Capsule Networks for Medical
Segmentation on Less Labeled Data [10.371128893952537]
This work extends capsule networks for volumetric medical image segmentation with self-supervised learning.
Our 3D capsule network with self-supervised pre-training considerably outperforms previous capsule networks and 3D-UNets.
arXiv Detail & Related papers (2022-01-15T18:42:38Z) - Routing with Self-Attention for Multimodal Capsule Networks [108.85007719132618]
We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework.
To adapt the capsules to large-scale input data, we propose a novel routing by self-attention mechanism that selects relevant capsules.
This allows not only for robust training with noisy video data, but also to scale up the size of the capsule network compared to traditional routing methods.
arXiv Detail & Related papers (2021-12-01T19:01:26Z) - The Lottery Tickets Hypothesis for Supervised and Self-supervised
Pre-training in Computer Vision Models [115.49214555402567]
Pre-trained weights often boost a wide range of downstream tasks including classification, detection, and segmentation.
Recent studies suggest that pre-training benefits from gigantic model capacity.
In this paper, we examine supervised and self-supervised pre-trained models through the lens of the lottery ticket hypothesis (LTH)
arXiv Detail & Related papers (2020-12-12T21:53:55Z) - An Improvement for Capsule Networks using Depthwise Separable
Convolution [1.876462046907555]
Capsule Networks face a critical problem in computer vision in the sense that the image background can challenge its performance.
We propose to improve Capsule Networks' architecture by replacing the Standard Convolution with a Depthwise Separable Convolution.
New design significantly reduces the model's total parameters while increases stability and offers competitive accuracy.
arXiv Detail & Related papers (2020-07-30T00:58:34Z) - Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation.
We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters.
As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z) - Subspace Capsule Network [85.69796543499021]
SubSpace Capsule Network (SCN) exploits the idea of capsule networks to model possible variations in the appearance or implicitly defined properties of an entity.
SCN can be applied to both discriminative and generative models without incurring computational overhead compared to CNN during test time.
arXiv Detail & Related papers (2020-02-07T17:51:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.