Do We Need Fully Connected Output Layers in Convolutional Networks?
- URL: http://arxiv.org/abs/2004.13587v2
- Date: Wed, 29 Apr 2020 03:20:47 GMT
- Title: Do We Need Fully Connected Output Layers in Convolutional Networks?
- Authors: Zhongchao Qian, Tyler L. Hayes, Kushal Kafle, Christopher Kanan
- Abstract summary: We show that the typical approach of having a fully connected final output layer is inefficient in terms of parameter count.
We are able to achieve comparable performance to a traditionally learned fully connected classification output layer on the ImageNet-1K, CIFAR-100, Stanford Cars-196, and Oxford Flowers-102 datasets.
- Score: 40.84294968326573
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditionally, deep convolutional neural networks consist of a series of
convolutional and pooling layers followed by one or more fully connected (FC)
layers to perform the final classification. While this design has been
successful, for datasets with a large number of categories, the fully connected
layers often account for a large percentage of the network's parameters. For
applications with memory constraints, such as mobile devices and embedded
platforms, this is not ideal. Recently, a family of architectures that involve
replacing the learned fully connected output layer with a fixed layer has been
proposed as a way to achieve better efficiency. In this paper we examine this
idea further and demonstrate that fixed classifiers offer no additional benefit
compared to simply removing the output layer along with its parameters. We
further demonstrate that the typical approach of having a fully connected final
output layer is inefficient in terms of parameter count. We are able to achieve
comparable performance to a traditionally learned fully connected
classification output layer on the ImageNet-1K, CIFAR-100, Stanford Cars-196,
and Oxford Flowers-102 datasets, while not having a fully connected output
layer at all.
Related papers
- Make Deep Networks Shallow Again [6.647569337929869]
A breakthrough has been achieved by the concept of residual connections.
A stack of residual connection layers can be expressed as an expansion of terms similar to the Taylor expansion.
In other words, a sequential deep architecture is substituted by a parallel shallow one.
arXiv Detail & Related papers (2023-09-15T14:18:21Z) - Structure-Aware DropEdge Towards Deep Graph Convolutional Networks [83.38709956935095]
Graph Convolutional Networks (GCNs) encounter a remarkable drop in performance when multiple layers are piled up.
Over-smoothing isolates the network output from the input with the increase of network depth, weakening expressivity and trainability.
We investigate refined measures upon DropEdge -- an existing simple yet effective technique to relieve over-smoothing.
arXiv Detail & Related papers (2023-06-21T08:11:40Z) - Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task.
A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks.
Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z) - Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter.
We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures''
Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z) - Augmenting Convolutional networks with attention-based aggregation [55.97184767391253]
We show how to augment any convolutional network with an attention-based global map to achieve non-local reasoning.
We plug this learned aggregation layer with a simplistic patch-based convolutional network parametrized by 2 parameters (width and depth)
It yields surprisingly competitive trade-offs between accuracy and complexity, in particular in terms of memory consumption.
arXiv Detail & Related papers (2021-12-27T14:05:41Z) - Basis Scaling and Double Pruning for Efficient Inference in
Network-Based Transfer Learning [1.3467579878240454]
We decompose a convolutional layer into two layers: a convolutional layer with the orthonormal basis vectors as the filters, and a "BasisScalingConv" layer which is responsible for rescaling the features.
We can achieve pruning ratios up to 74.6% for CIFAR-10 and 98.9% for MNIST in model parameters.
arXiv Detail & Related papers (2021-08-06T00:04:02Z) - Connecting Sphere Manifolds Hierarchically for Regularization [16.082095595061617]
We consider classification problems with hierarchically organized classes.
Our technique replaces the last layer of a neural network by combining a spherical fully-connected layer with a hierarchical layer.
This regularization is shown to improve the performance of widely used deep neural network architectures.
arXiv Detail & Related papers (2021-06-25T10:51:36Z) - ProgressiveSpinalNet architecture for FC layers [0.0]
In deeplearning models the FC layer has biggest important role for classification of the input based on the learned features from previous layers.
This paper aims to reduce these large numbers of parameters significantly with improved performance.
The motivation is inspired from SpinalNet and other biological architecture.
arXiv Detail & Related papers (2021-03-21T11:54:50Z) - Convolutional Networks with Dense Connectivity [59.30634544498946]
We introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion.
For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers.
We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks.
arXiv Detail & Related papers (2020-01-08T06:54:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.