Utilizing Excess Resources in Training Neural Networks
- URL: http://arxiv.org/abs/2207.05532v1
- Date: Tue, 12 Jul 2022 13:48:40 GMT
- Title: Utilizing Excess Resources in Training Neural Networks
- Authors: Amit Henig and Raja Giryes
- Abstract summary: We implement a linear cascade of filtering layers in a kernel filtering fashion, which prevents the trained architecture from becoming unnecessarily deeper.
This also allows using our approach with almost any network architecture and let combining the filtering layers into a single layer in test time.
We demonstrate the advantage of KFLO on various network models and datasets in supervised learning.
- Score: 41.07083436560303
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we suggest Kernel Filtering Linear Overparameterization (KFLO),
where a linear cascade of filtering layers is used during training to improve
network performance in test time. We implement this cascade in a kernel
filtering fashion, which prevents the trained architecture from becoming
unnecessarily deeper. This also allows using our approach with almost any
network architecture and let combining the filtering layers into a single layer
in test time. Thus, our approach does not add computational complexity during
inference. We demonstrate the advantage of KFLO on various network models and
datasets in supervised learning.
Related papers
- Local Kernel Renormalization as a mechanism for feature learning in
overparametrized Convolutional Neural Networks [0.0]
Empirical evidence shows that fully-connected neural networks in the infinite-width limit eventually outperform their finite-width counterparts.
State-of-the-art architectures with convolutional layers achieve optimal performances in the finite-width regime.
We show that the generalization performance of a finite-width FC network can be obtained by an infinite-width network, with a suitable choice of the Gaussian priors.
arXiv Detail & Related papers (2023-07-21T17:22:04Z) - Learning Sparse Neural Networks with Identity Layers [33.11654855515443]
We investigate the intrinsic link between network sparsity and interlayer feature similarity.
We propose a plug-and-play CKA-based Sparsity Regularization for sparse network training, dubbed CKA-SR.
We find that CKA-SR consistently improves the performance of several State-Of-The-Art sparse training methods.
arXiv Detail & Related papers (2023-07-14T14:58:44Z) - Efficient Implementation of a Multi-Layer Gradient-Free Online-Trainable
Spiking Neural Network on FPGA [0.31498833540989407]
ODESA is the first network to have end-to-end multi-layer online local supervised training without using gradients.
This research shows that the network architecture and the online training of weights and thresholds can be implemented efficiently on a large scale in hardware.
arXiv Detail & Related papers (2023-05-31T00:34:15Z) - Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter.
We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures''
Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z) - Local Critic Training for Model-Parallel Learning of Deep Neural
Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training.
We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z) - Kernelized Classification in Deep Networks [49.47339560731506]
We propose a kernelized classification layer for deep networks.
We advocate a nonlinear classification layer by using the kernel trick on the softmax cross-entropy loss function during training.
We show the usefulness of the proposed nonlinear classification layer on several datasets and tasks.
arXiv Detail & Related papers (2020-12-08T21:43:19Z) - Improving Sample Efficiency with Normalized RBF Kernels [0.0]
This paper explores how neural networks with normalized Radial Basis Function (RBF) kernels can be trained to achieve better sample efficiency.
We show how this kind of output layer can find embedding spaces where the classes are compact and well-separated.
Experiments on CIFAR-10 and CIFAR-100 show that networks with normalized kernels as output layer can achieve higher sample efficiency, high compactness and well-separability.
arXiv Detail & Related papers (2020-07-30T11:40:29Z) - Dependency Aware Filter Pruning [74.69495455411987]
Pruning a proportion of unimportant filters is an efficient way to mitigate the inference cost.
Previous work prunes filters according to their weight norms or the corresponding batch-norm scaling factors.
We propose a novel mechanism to dynamically control the sparsity-inducing regularization so as to achieve the desired sparsity.
arXiv Detail & Related papers (2020-05-06T07:41:22Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.