Interflow: Aggregating Multi-layer Feature Mappings with Attention
Mechanism
- URL: http://arxiv.org/abs/2106.14073v1
- Date: Sat, 26 Jun 2021 18:22:01 GMT
- Title: Interflow: Aggregating Multi-layer Feature Mappings with Attention
Mechanism
- Authors: Zhicheng Cai
- Abstract summary: This paper proposes the Interflow algorithm specially for traditional CNN models.
Interflow divides CNNs into several stages according to the depth and makes predictions by the feature mappings in each stage.
It can alleviate gradient vanishing problem, lower the difficulty of network depth selection, and lighten possible over-fitting problem.
- Score: 0.7614628596146599
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditionally, CNN models possess hierarchical structures and utilize the
feature mapping of the last layer to obtain the prediction output. However, it
can be difficulty to settle the optimal network depth and make the middle
layers learn distinguished features. This paper proposes the Interflow
algorithm specially for traditional CNN models. Interflow divides CNNs into
several stages according to the depth and makes predictions by the feature
mappings in each stage. Subsequently, we input these prediction branches into a
well-designed attention module, which learns the weights of these prediction
branches, aggregates them and obtains the final output. Interflow weights and
fuses the features learned in both shallower and deeper layers, making the
feature information at each stage processed reasonably and effectively,
enabling the middle layers to learn more distinguished features, and enhancing
the model representation ability. In addition, Interflow can alleviate gradient
vanishing problem, lower the difficulty of network depth selection, and lighten
possible over-fitting problem by introducing attention mechanism. Besides, it
can avoid network degradation as a byproduct. Compared with the original model,
the CNN model with Interflow achieves higher test accuracy on multiple
benchmark datasets.
Related papers
- Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task.
A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks.
Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z) - Towards Disentangling Information Paths with Coded ResNeXt [11.884259630414515]
We take a novel approach to enhance the transparency of the function of the whole network.
We propose a neural network architecture for classification, in which the information that is relevant to each class flows through specific paths.
arXiv Detail & Related papers (2022-02-10T21:45:49Z) - Parallel Multi-Scale Networks with Deep Supervision for Hand Keypoint
Detection [3.1781111932870716]
We propose a novel CNN model named Multi-Scale Deep Supervision Network (P-MSDSNet)
P-MSDSNet learns feature maps at different scales with deep supervisions to produce attention maps for adaptive feature propagation from layers to layers.
We show that P-MSDSNet outperforms the state-of-the-art approaches on benchmark datasets while requiring fewer number of parameters.
arXiv Detail & Related papers (2021-12-19T22:38:16Z) - Inference Graphs for CNN Interpretation [12.765543440576144]
Convolutional neural networks (CNNs) have achieved superior accuracy in many visual related tasks.
We propose to model the network hidden layers activity using probabilistic models.
We show that such graphs are useful for understanding the general inference process of a class, as well as explaining decisions the network makes regarding specific images.
arXiv Detail & Related papers (2021-10-20T13:56:09Z) - The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer.
Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z) - Spatio-Temporal Inception Graph Convolutional Networks for
Skeleton-Based Action Recognition [126.51241919472356]
We design a simple and highly modularized graph convolutional network architecture for skeleton-based action recognition.
Our network is constructed by repeating a building block that aggregates multi-granularity information from both the spatial and temporal paths.
arXiv Detail & Related papers (2020-11-26T14:43:04Z) - AutoPruning for Deep Neural Network with Dynamic Channel Masking [28.018077874687343]
We propose a learning based auto pruning algorithm for deep neural network.
A two objectives' problem that aims for the the weights and the best channels for each layer is first formulated.
An alternative optimization approach is then proposed to derive the optimal channel numbers and weights simultaneously.
arXiv Detail & Related papers (2020-10-22T20:12:46Z) - Eigen-CAM: Class Activation Map using Principal Components [1.2691047660244335]
This paper builds on previous ideas to cope with the increasing demand for interpretable, robust, and transparent models.
The proposed Eigen-CAM computes and visualizes the principle components of the learned features/representations from the convolutional layers.
arXiv Detail & Related papers (2020-08-01T17:14:13Z) - DHP: Differentiable Meta Pruning via HyperNetworks [158.69345612783198]
This paper introduces a differentiable pruning method via hypernetworks for automatic network pruning.
Latent vectors control the output channels of the convolutional layers in the backbone network and act as a handle for the pruning of the layers.
Experiments are conducted on various networks for image classification, single image super-resolution, and denoising.
arXiv Detail & Related papers (2020-03-30T17:59:18Z) - Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability.
Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network.
Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.