IC Networks: Remodeling the Basic Unit for Convolutional Neural Networks
- URL: http://arxiv.org/abs/2102.03495v1
- Date: Sat, 6 Feb 2021 03:15:43 GMT
- Title: IC Networks: Remodeling the Basic Unit for Convolutional Neural Networks
- Authors: Junyi An and Fengshan Liu and Jian Zhao and Furao Shen
- Abstract summary: "Inter-layer Collision" (IC) structure can be integrated into existing CNNs to improve their performance.
New training method, namely weak logit distillation (WLD), is proposed to speed up the training of IC networks.
In the ImageNet experiment, we integrate the IC structure into ResNet-50 and reduce the top-1 error from 22.38% to 21.75%.
- Score: 8.218732270970381
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional neural network (CNN) is a class of artificial neural networks
widely used in computer vision tasks. Most CNNs achieve excellent performance
by stacking certain types of basic units. In addition to increasing the depth
and width of the network, designing more effective basic units has become an
important research topic. Inspired by the elastic collision model in physics,
we present a general structure which can be integrated into the existing CNNs
to improve their performance. We term it the "Inter-layer Collision" (IC)
structure. Compared to the traditional convolution structure, the IC structure
introduces nonlinearity and feature recalibration in the linear convolution
operation, which can capture more fine-grained features. In addition, a new
training method, namely weak logit distillation (WLD), is proposed to speed up
the training of IC networks by extracting knowledge from pre-trained basic
models. In the ImageNet experiment, we integrate the IC structure into
ResNet-50 and reduce the top-1 error from 22.38% to 21.75%, which also catches
up the top-1 error of ResNet-100 (21.75%) with nearly half of FLOPs.
Related papers
- Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs.
We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z) - Optimizing Convolutional Neural Network Architecture [0.0]
Convolutional Neural Networks (CNN) are widely used to face challenging tasks like speech recognition, natural language processing or computer vision.
We propose Optimizing Convolutional Neural Network Architecture (OCNNA), a novel CNN optimization and construction method based on pruning and knowledge distillation.
Our method has been compared with more than 20 convolutional neural network simplification algorithms obtaining outstanding results.
arXiv Detail & Related papers (2023-12-17T12:23:11Z) - A Generalization of Continuous Relaxation in Structured Pruning [0.3277163122167434]
Trends indicate that deeper and larger neural networks with an increasing number of parameters achieve higher accuracy than smaller neural networks.
We generalize structured pruning with algorithms for network augmentation, pruning, sub-network collapse and removal.
The resulting CNN executes efficiently on GPU hardware without computationally expensive sparse matrix operations.
arXiv Detail & Related papers (2023-08-28T14:19:13Z) - Receptive Field Refinement for Convolutional Neural Networks Reliably
Improves Predictive Performance [1.52292571922932]
We present a new approach to receptive field analysis that can yield these types of theoretical and empirical performance gains.
Our approach is able to improve ImageNet1K performance across a wide range of well-known, state-of-the-art (SOTA) model classes.
arXiv Detail & Related papers (2022-11-26T05:27:44Z) - EAPruning: Evolutionary Pruning for Vision Transformers and CNNs [11.994217333212736]
We undertake a simple and effective approach that can be easily applied to both vision transformers and convolutional neural networks.
We achieve a 50% FLOPS reduction for ResNet50 and MobileNetV1, leading to 1.37x and 1.34x speedup respectively.
arXiv Detail & Related papers (2022-10-01T03:38:56Z) - Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction.
We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training.
Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z) - Simultaneous Training of Partially Masked Neural Networks [67.19481956584465]
We show that it is possible to train neural networks in such a way that a predefined 'core' subnetwork can be split-off from the trained full network with remarkable good performance.
We show that training a Transformer with a low-rank core gives a low-rank model with superior performance than when training the low-rank model alone.
arXiv Detail & Related papers (2021-06-16T15:57:51Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z) - Structured Convolutions for Efficient Neural Network Design [65.36569572213027]
We tackle model efficiency by exploiting redundancy in the textitimplicit structure of the building blocks of convolutional neural networks.
We show how this decomposition can be applied to 2D and 3D kernels as well as the fully-connected layers.
arXiv Detail & Related papers (2020-08-06T04:38:38Z) - Channel Equilibrium Networks for Learning Deep Representation [63.76618960820138]
This work shows that the combination of normalization and rectified linear function leads to inhibited channels.
Unlike prior arts that simply removed the inhibited channels, we propose to "wake them up" during training by designing a novel neural building block.
Channel Equilibrium (CE) block enables channels at the same layer to contribute equally to the learned representation.
arXiv Detail & Related papers (2020-02-29T09:02:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.