Related papers: Do We Need Fully Connected Output Layers in Convolutional Networks?

Do We Need Fully Connected Output Layers in Convolutional Networks?

URL: http://arxiv.org/abs/2004.13587v2
Date: Wed, 29 Apr 2020 03:20:47 GMT
Title: Do We Need Fully Connected Output Layers in Convolutional Networks?
Authors: Zhongchao Qian, Tyler L. Hayes, Kushal Kafle, Christopher Kanan
Abstract summary: We show that the typical approach of having a fully connected final output layer is inefficient in terms of parameter count. We are able to achieve comparable performance to a traditionally learned fully connected classification output layer on the ImageNet-1K, CIFAR-100, Stanford Cars-196, and Oxford Flowers-102 datasets.
Score: 40.84294968326573
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Traditionally, deep convolutional neural networks consist of a series of convolutional and pooling layers followed by one or more fully connected (FC) layers to perform the final classification. While this design has been successful, for datasets with a large number of categories, the fully connected layers often account for a large percentage of the network's parameters. For applications with memory constraints, such as mobile devices and embedded platforms, this is not ideal. Recently, a family of architectures that involve replacing the learned fully connected output layer with a fixed layer has been proposed as a way to achieve better efficiency. In this paper we examine this idea further and demonstrate that fixed classifiers offer no additional benefit compared to simply removing the output layer along with its parameters. We further demonstrate that the typical approach of having a fully connected final output layer is inefficient in terms of parameter count. We are able to achieve comparable performance to a traditionally learned fully connected classification output layer on the ImageNet-1K, CIFAR-100, Stanford Cars-196, and Oxford Flowers-102 datasets, while not having a fully connected output layer at all.

Related papers

Make Deep Networks Shallow Again [6.647569337929869]
A breakthrough has been achieved by the concept of residual connections. A stack of residual connection layers can be expressed as an expansion of terms similar to the Taylor expansion. In other words, a sequential deep architecture is substituted by a parallel shallow one.
arXiv Detail & Related papers (2023-09-15T14:18:21Z)
Structure-Aware DropEdge Towards Deep Graph Convolutional Networks [83.38709956935095]
Graph Convolutional Networks (GCNs) encounter a remarkable drop in performance when multiple layers are piled up. Over-smoothing isolates the network output from the input with the increase of network depth, weakening expressivity and trainability. We investigate refined measures upon DropEdge -- an existing simple yet effective technique to relieve over-smoothing.
arXiv Detail & Related papers (2023-06-21T08:11:40Z)
Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task. A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks. Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z)
Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter. We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures'' Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z)
Augmenting Convolutional networks with attention-based aggregation [55.97184767391253]
We show how to augment any convolutional network with an attention-based global map to achieve non-local reasoning. We plug this learned aggregation layer with a simplistic patch-based convolutional network parametrized by 2 parameters (width and depth) It yields surprisingly competitive trade-offs between accuracy and complexity, in particular in terms of memory consumption.
arXiv Detail & Related papers (2021-12-27T14:05:41Z)
Basis Scaling and Double Pruning for Efficient Inference in Network-Based Transfer Learning [1.3467579878240454]
We decompose a convolutional layer into two layers: a convolutional layer with the orthonormal basis vectors as the filters, and a "BasisScalingConv" layer which is responsible for rescaling the features. We can achieve pruning ratios up to 74.6% for CIFAR-10 and 98.9% for MNIST in model parameters.
arXiv Detail & Related papers (2021-08-06T00:04:02Z)
Connecting Sphere Manifolds Hierarchically for Regularization [16.082095595061617]
We consider classification problems with hierarchically organized classes. Our technique replaces the last layer of a neural network by combining a spherical fully-connected layer with a hierarchical layer. This regularization is shown to improve the performance of widely used deep neural network architectures.
arXiv Detail & Related papers (2021-06-25T10:51:36Z)
ProgressiveSpinalNet architecture for FC layers [0.0]
In deeplearning models the FC layer has biggest important role for classification of the input based on the learned features from previous layers. This paper aims to reduce these large numbers of parameters significantly with improved performance. The motivation is inspired from SpinalNet and other biological architecture.
arXiv Detail & Related papers (2021-03-21T11:54:50Z)
Convolutional Networks with Dense Connectivity [59.30634544498946]
We introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks.
arXiv Detail & Related papers (2020-01-08T06:54:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.