Related papers: Utilizing Excess Resources in Training Neural Networks

Utilizing Excess Resources in Training Neural Networks

URL: http://arxiv.org/abs/2207.05532v1
Date: Tue, 12 Jul 2022 13:48:40 GMT
Title: Utilizing Excess Resources in Training Neural Networks
Authors: Amit Henig and Raja Giryes
Abstract summary: We implement a linear cascade of filtering layers in a kernel filtering fashion, which prevents the trained architecture from becoming unnecessarily deeper. This also allows using our approach with almost any network architecture and let combining the filtering layers into a single layer in test time. We demonstrate the advantage of KFLO on various network models and datasets in supervised learning.
Score: 41.07083436560303
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we suggest Kernel Filtering Linear Overparameterization (KFLO), where a linear cascade of filtering layers is used during training to improve network performance in test time. We implement this cascade in a kernel filtering fashion, which prevents the trained architecture from becoming unnecessarily deeper. This also allows using our approach with almost any network architecture and let combining the filtering layers into a single layer in test time. Thus, our approach does not add computational complexity during inference. We demonstrate the advantage of KFLO on various network models and datasets in supervised learning.

Related papers

Advanced deep architecture pruning using single filter performance [0.0]
Pruning the parameters and structure of neural networks reduces the computational complexity, energy consumption, and latency during inference. Here, we demonstrate how this understanding paves the path to highly dilute the convolutional layers of deep architectures without affecting their overall accuracy using applied filter cluster connections.
arXiv Detail & Related papers (2025-01-22T13:40:43Z)
The Master Key Filters Hypothesis: Deep Filters Are General [51.900488744931785]
Convolutional neural network (CNN) filters become increasingly specialized in deeper layers. Recent observations of clusterable repeating patterns in depthwise separable CNNs (DS-CNNs) trained on ImageNet motivated this paper. Our analysis of DS-CNNs reveals that deep filters maintain generality, contradicting the expected transition to class-specific filters.
arXiv Detail & Related papers (2024-12-21T20:04:23Z)
Local Kernel Renormalization as a mechanism for feature learning in overparametrized Convolutional Neural Networks [0.0]
Empirical evidence shows that fully-connected neural networks in the infinite-width limit eventually outperform their finite-width counterparts. State-of-the-art architectures with convolutional layers achieve optimal performances in the finite-width regime. We show that the generalization performance of a finite-width FC network can be obtained by an infinite-width network, with a suitable choice of the Gaussian priors.
arXiv Detail & Related papers (2023-07-21T17:22:04Z)
Learning Sparse Neural Networks with Identity Layers [33.11654855515443]
We investigate the intrinsic link between network sparsity and interlayer feature similarity. We propose a plug-and-play CKA-based Sparsity Regularization for sparse network training, dubbed CKA-SR. We find that CKA-SR consistently improves the performance of several State-Of-The-Art sparse training methods.
arXiv Detail & Related papers (2023-07-14T14:58:44Z)
Efficient Implementation of a Multi-Layer Gradient-Free Online-Trainable Spiking Neural Network on FPGA [0.31498833540989407]
ODESA is the first network to have end-to-end multi-layer online local supervised training without using gradients. This research shows that the network architecture and the online training of weights and thresholds can be implemented efficiently on a large scale in hardware.
arXiv Detail & Related papers (2023-05-31T00:34:15Z)
Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter. We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures'' Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z)
Local Critic Training for Model-Parallel Learning of Deep Neural Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training. We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z)
Kernelized Classification in Deep Networks [49.47339560731506]
We propose a kernelized classification layer for deep networks. We advocate a nonlinear classification layer by using the kernel trick on the softmax cross-entropy loss function during training. We show the usefulness of the proposed nonlinear classification layer on several datasets and tasks.
arXiv Detail & Related papers (2020-12-08T21:43:19Z)
Improving Sample Efficiency with Normalized RBF Kernels [0.0]
This paper explores how neural networks with normalized Radial Basis Function (RBF) kernels can be trained to achieve better sample efficiency. We show how this kind of output layer can find embedding spaces where the classes are compact and well-separated. Experiments on CIFAR-10 and CIFAR-100 show that networks with normalized kernels as output layer can achieve higher sample efficiency, high compactness and well-separability.
arXiv Detail & Related papers (2020-07-30T11:40:29Z)
Dependency Aware Filter Pruning [74.69495455411987]
Pruning a proportion of unimportant filters is an efficient way to mitigate the inference cost. Previous work prunes filters according to their weight norms or the corresponding batch-norm scaling factors. We propose a novel mechanism to dynamically control the sparsity-inducing regularization so as to achieve the desired sparsity.
arXiv Detail & Related papers (2020-05-06T07:41:22Z)
Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks. With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.