Improved Residual Networks for Image and Video Recognition
- URL: http://arxiv.org/abs/2004.04989v1
- Date: Fri, 10 Apr 2020 11:09:50 GMT
- Title: Improved Residual Networks for Image and Video Recognition
- Authors: Ionut Cosmin Duta, Li Liu, Fan Zhu, Ling Shao
- Abstract summary: Residual networks (ResNets) represent a powerful type of convolutional neural network (CNN) architecture.
We show consistent improvements in accuracy and learning convergence over the baseline.
Our proposed approach allows us to train extremely deep networks, while the baseline shows severe optimization issues.
- Score: 98.10703825716142
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Residual networks (ResNets) represent a powerful type of convolutional neural
network (CNN) architecture, widely adopted and used in various tasks. In this
work we propose an improved version of ResNets. Our proposed improvements
address all three main components of a ResNet: the flow of information through
the network layers, the residual building block, and the projection shortcut.
We are able to show consistent improvements in accuracy and learning
convergence over the baseline. For instance, on ImageNet dataset, using the
ResNet with 50 layers, for top-1 accuracy we can report a 1.19% improvement
over the baseline in one setting and around 2% boost in another. Importantly,
these improvements are obtained without increasing the model complexity. Our
proposed approach allows us to train extremely deep networks, while the
baseline shows severe optimization issues. We report results on three tasks
over six datasets: image classification (ImageNet, CIFAR-10 and CIFAR-100),
object detection (COCO) and video action recognition (Kinetics-400 and
Something-Something-v2). In the deep learning era, we establish a new milestone
for the depth of a CNN. We successfully train a 404-layer deep CNN on the
ImageNet dataset and a 3002-layer network on CIFAR-10 and CIFAR-100, while the
baseline is not able to converge at such extreme depths. Code is available at:
https://github.com/iduta/iresnet
Related papers
- Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness
with Dataset Reinforcement [68.44100784364987]
We propose a strategy to improve a dataset once such that the accuracy of any model architecture trained on the reinforced dataset is improved at no additional training cost for users.
We create a reinforced version of the ImageNet training dataset, called ImageNet+, as well as reinforced datasets CIFAR-100+, Flowers-102+, and Food-101+.
Models trained with ImageNet+ are more accurate, robust, and calibrated, and transfer well to downstream tasks.
arXiv Detail & Related papers (2023-03-15T23:10:17Z) - Multipod Convolutional Network [2.1485350418225244]
We experimentally observed that three parallel pod networks (TripodNet) produce the best results in commonly used object recognition datasets.
TripodNet achieved state-of-the-art performance on CIFAR-10 and ImageNet datasets.
arXiv Detail & Related papers (2022-10-03T02:37:57Z) - Connection Reduction Is All You Need [0.10878040851637998]
Empirical research shows that simply stacking convolutional layers does not make the network train better.
We propose two new algorithms to connect layers.
ShortNet1 has a 5% lower test error rate and 25% faster inference time than Baseline.
arXiv Detail & Related papers (2022-08-02T13:00:35Z) - EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for
Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups.
Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K.
Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z) - Evolutionary Neural Cascade Search across Supernetworks [68.8204255655161]
We introduce ENCAS - Evolutionary Neural Cascade Search.
ENCAS can be used to search over multiple pretrained supernetworks.
We test ENCAS on common computer vision benchmarks.
arXiv Detail & Related papers (2022-03-08T11:06:01Z) - ThreshNet: An Efficient DenseNet using Threshold Mechanism to Reduce
Connections [1.2542322096299672]
We propose a new network architecture using threshold mechanism to further optimize the method of connections.
ThreshNet achieves up to 60% reduction in inference time compared to DenseNet, and up to 35% faster training speed and 20% reduction in error rate.
arXiv Detail & Related papers (2022-01-09T13:52:16Z) - Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths.
Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z) - Towards Lossless Binary Convolutional Neural Networks Using Piecewise
Approximation [4.023728681102073]
CNNs can significantly reduce the number of arithmetic operations and the size of memory storage.
However, the accuracy degradation of single and multiple binary CNNs is unacceptable for modern architectures.
We propose a Piecewise Approximation scheme for multiple binary CNNs which lessens accuracy loss by approximating full precision weights and activations.
arXiv Detail & Related papers (2020-08-08T13:32:33Z) - Pyramidal Convolution: Rethinking Convolutional Neural Networks for
Visual Recognition [98.10703825716142]
This work introduces pyramidal convolution (PyConv), which is capable of processing the input at multiple filter scales.
We present different architectures based on PyConv for four main tasks on visual recognition: image classification, video action classification/recognition, object detection and semantic image segmentation/parsing.
arXiv Detail & Related papers (2020-06-20T10:19:29Z) - Adjoined Networks: A Training Paradigm with Applications to Network
Compression [3.995047443480282]
We introduce Adjoined Networks, or AN, a learning paradigm that trains both the original base network and the smaller compressed network together.
Using ResNet-50 as the base network, AN achieves 71.8% top-1 accuracy with only 1.8M parameters and 1.6 GFLOPs on the ImageNet data-set.
We propose Differentiable Adjoined Networks (DAN), a training paradigm that augments AN by using neural architecture search to jointly learn both the width and the weights for each layer of the smaller network.
arXiv Detail & Related papers (2020-06-10T02:48:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.