ResMLP: Feedforward networks for image classification with
data-efficient training
- URL: http://arxiv.org/abs/2105.03404v1
- Date: Fri, 7 May 2021 17:31:44 GMT
- Title: ResMLP: Feedforward networks for image classification with
data-efficient training
- Authors: Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu Cord,
Alaaeldin El-Nouby, Edouard Grave, Armand Joulin, Gabriel Synnaeve, Jakob
Verbeek, Herv\'e J\'egou
- Abstract summary: We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification.
We will share our code based on the Timm library and pre-trained models.
- Score: 73.26364887378597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present ResMLP, an architecture built entirely upon multi-layer
perceptrons for image classification. It is a simple residual network that
alternates (i) a linear layer in which image patches interact, independently
and identically across channels, and (ii) a two-layer feed-forward network in
which channels interact independently per patch. When trained with a modern
training strategy using heavy data-augmentation and optionally distillation, it
attains surprisingly good accuracy/complexity trade-offs on ImageNet. We will
share our code based on the Timm library and pre-trained models.
Related papers
- Co-training $2^L$ Submodels for Visual Recognition [67.02999567435626]
Submodel co-training is a regularization method related to co-training, self-distillation and depth.
We show that submodel co-training is effective to train backbones for recognition tasks such as image classification and semantic segmentation.
arXiv Detail & Related papers (2022-12-09T14:38:09Z) - Multi-scale Transformer Network with Edge-aware Pre-training for
Cross-Modality MR Image Synthesis [52.41439725865149]
Cross-modality magnetic resonance (MR) image synthesis can be used to generate missing modalities from given ones.
Existing (supervised learning) methods often require a large number of paired multi-modal data to train an effective synthesis model.
We propose a Multi-scale Transformer Network (MT-Net) with edge-aware pre-training for cross-modality MR image synthesis.
arXiv Detail & Related papers (2022-12-02T11:40:40Z) - A Perturbation Resistant Transformation and Classification System for
Deep Neural Networks [0.685316573653194]
Deep convolutional neural networks accurately classify a diverse range of natural images, but may be easily deceived when designed.
In this paper, we design a multi-pronged training, unbounded input transformation, and image ensemble system that is attack and not easily estimated.
arXiv Detail & Related papers (2022-08-25T02:58:47Z) - Distilling Ensemble of Explanations for Weakly-Supervised Pre-Training
of Image Segmentation Models [54.49581189337848]
We propose a method to enable the end-to-end pre-training for image segmentation models based on classification datasets.
The proposed method leverages a weighted segmentation learning procedure to pre-train the segmentation network en masse.
Experiment results show that, with ImageNet accompanied by PSSL as the source dataset, the proposed end-to-end pre-training strategy successfully boosts the performance of various segmentation models.
arXiv Detail & Related papers (2022-07-04T13:02:32Z) - Weakly-supervised fire segmentation by visualizing intermediate CNN
layers [82.75113406937194]
Fire localization in images and videos is an important step for an autonomous system to combat fire incidents.
We consider weakly supervised segmentation of fire in images, in which only image labels are used to train the network.
We show that in the case of fire segmentation, which is a binary segmentation problem, the mean value of features in a mid-layer of classification CNN can perform better than conventional Class Activation Mapping (CAM) method.
arXiv Detail & Related papers (2021-11-16T11:56:28Z) - RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for
Image Recognition [123.59890802196797]
We propose RepMLP, a multi-layer-perceptron-style neural network building block for image recognition.
We construct convolutional layers inside a RepMLP during training and merge them into the FC for inference.
By inserting RepMLP in traditional CNN, we improve ResNets by 1.8% accuracy on ImageNet, 2.9% for face recognition, and 2.3% mIoU on Cityscapes with lower FLOPs.
arXiv Detail & Related papers (2021-05-05T06:17:40Z) - RandomForestMLP: An Ensemble-Based Multi-Layer Perceptron Against Curse
of Dimensionality [0.0]
We present a novel and practical deep learning pipeline termed RandomForestMLP.
This core trainable classification engine consists of a convolutional neural network backbone followed by an ensemble-based multi-layer perceptrons core for the classification task.
arXiv Detail & Related papers (2020-11-02T18:25:36Z) - Retrain or not retrain? -- efficient pruning methods of deep CNN
networks [0.30458514384586394]
Convolutional neural networks (CNN) play a major role in image processing tasks like image classification, object detection, semantic segmentation.
Very often CNN networks have from several to hundred stacked layers with several megabytes of weights.
One of the possible methods to reduce complexity and memory footprint is pruning.
arXiv Detail & Related papers (2020-02-12T23:24:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.