Reversible Column Networks
- URL: http://arxiv.org/abs/2212.11696v1
- Date: Thu, 22 Dec 2022 13:37:59 GMT
- Title: Reversible Column Networks
- Authors: Yuxuan Cai, Yizhuang Zhou, Qi Han, Jianjian Sun, Xiangwen Kong, Jun
Li, Xiangyu Zhang
- Abstract summary: Reversible Column Network (RevCol) is a new neural network design paradigm.
CNN-style RevCol models can achieve very competitive performances on computer vision tasks.
RevCol can also be introduced into transformers or other neural networks.
- Score: 13.385421619753227
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new neural network design paradigm Reversible Column Network
(RevCol). The main body of RevCol is composed of multiple copies of
subnetworks, named columns respectively, between which multi-level reversible
connections are employed. Such architectural scheme attributes RevCol very
different behavior from conventional networks: during forward propagation,
features in RevCol are learned to be gradually disentangled when passing
through each column, whose total information is maintained rather than
compressed or discarded as other network does. Our experiments suggest that
CNN-style RevCol models can achieve very competitive performances on multiple
computer vision tasks such as image classification, object detection and
semantic segmentation, especially with large parameter budget and large
dataset. For example, after ImageNet-22K pre-training, RevCol-XL obtains 88.2%
ImageNet-1K accuracy. Given more pre-training data, our largest model RevCol-H
reaches 90.0% on ImageNet-1K, 63.8% APbox on COCO detection minival set, 61.0%
mIoU on ADE20k segmentation. To our knowledge, it is the best COCO detection
and ADE20k segmentation result among pure (static) CNN models. Moreover, as a
general macro architecture fashion, RevCol can also be introduced into
transformers or other neural networks, which is demonstrated to improve the
performances in both computer vision and NLP tasks. We release code and models
at https://github.com/megvii-research/RevCol
Related papers
- Using DUCK-Net for Polyp Image Segmentation [0.0]
"DUCK-Net" is capable of effectively learning and generalizing from small amounts of medical images to perform accurate segmentation tasks.
We demonstrate its capabilities specifically for polyp segmentation in colonoscopy images.
arXiv Detail & Related papers (2023-11-03T20:58:44Z) - RevColV2: Exploring Disentangled Representations in Masked Image
Modeling [12.876864261893909]
Masked image modeling (MIM) has become a prevalent pre-training setup for vision foundation models and attains promising performance.
Existing MIM methods discard the decoder network during downstream applications, resulting in inconsistent representations between pre-training and fine-tuning.
We propose a new architecture, RevColV2, which tackles this issue by keeping the entire autoencoder architecture during both pre-training and fine-tuning.
arXiv Detail & Related papers (2023-09-02T18:41:27Z) - Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets.
DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z) - ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders [104.05133094625137]
We propose a fully convolutional masked autoencoder framework and a new Global Response Normalization layer.
This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets.
arXiv Detail & Related papers (2023-01-02T18:59:31Z) - InternImage: Exploring Large-Scale Vision Foundation Models with
Deformable Convolutions [95.94629864981091]
This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs.
The proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs.
arXiv Detail & Related papers (2022-11-10T18:59:04Z) - Recurrence along Depth: Deep Convolutional Neural Networks with
Recurrent Layer Aggregation [5.71305698739856]
This paper introduces a concept of layer aggregation to describe how information from previous layers can be reused to better extract features at the current layer.
We propose a very light-weighted module, called recurrent layer aggregation (RLA), by making use of the sequential structure of layers in a deep CNN.
Our RLA module is compatible with many mainstream deep CNNs, including ResNets, Xception and MobileNetV2.
arXiv Detail & Related papers (2021-10-22T15:36:33Z) - Single-stream CNN with Learnable Architecture for Multi-source Remote
Sensing Data [16.810239678639288]
We propose an efficient framework based on deep convolutional neural network (CNN) for multi-source remote sensing data joint classification.
The proposed method can theoretically adjust any modern CNN models to any multi-source remote sensing data set.
Experimental results demonstrate the effectiveness of the proposed single-stream CNNs.
arXiv Detail & Related papers (2021-09-13T16:10:41Z) - Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths.
Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z) - The Heterogeneity Hypothesis: Finding Layer-Wise Differentiated Network
Architectures [179.66117325866585]
We investigate a design space that is usually overlooked, i.e. adjusting the channel configurations of predefined networks.
We find that this adjustment can be achieved by shrinking widened baseline networks and leads to superior performance.
Experiments are conducted on various networks and datasets for image classification, visual tracking and image restoration.
arXiv Detail & Related papers (2020-06-29T17:59:26Z) - Improved Residual Networks for Image and Video Recognition [98.10703825716142]
Residual networks (ResNets) represent a powerful type of convolutional neural network (CNN) architecture.
We show consistent improvements in accuracy and learning convergence over the baseline.
Our proposed approach allows us to train extremely deep networks, while the baseline shows severe optimization issues.
arXiv Detail & Related papers (2020-04-10T11:09:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.