Related papers: ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution

ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution

URL: http://arxiv.org/abs/2009.02386v1
Date: Fri, 4 Sep 2020 20:41:47 GMT
Title: ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution
Authors: Ze Wang, Xiuyuan Cheng, Guillermo Sapiro, Qiang Qiu
Abstract summary: We introduce a structural regularization across convolutional kernels in a CNN. We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
Score: 57.635467829558664
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Convolutional Neural Networks (CNNs) are known to be significantly over-parametrized, and difficult to interpret, train and adapt. In this paper, we introduce a structural regularization across convolutional kernels in a CNN. In our approach, each convolution kernel is first decomposed as 2D dictionary atoms linearly combined by coefficients. The widely observed correlation and redundancy in a CNN hint a common low-rank structure among the decomposed coefficients, which is here further supported by our empirical observations. We then explicitly regularize CNN kernels by enforcing decomposed coefficients to be shared across sub-structures, while leaving each sub-structure only its own dictionary atoms, a few hundreds of parameters typically, which leads to dramatic model reductions. We explore models with sharing across different sub-structures to cover a wide range of trade-offs between parameter reduction and expressiveness. Our proposed regularized network structures open the door to better interpreting, training and adapting deep models. We validate the flexibility and compatibility of our method by image classification experiments on multiple datasets and underlying network structures, and show that CNNs now maintain performance with dramatic reduction in parameters and computations, e.g., only 5\% parameters are used in a ResNet-18 to achieve comparable performance. Further experiments on few-shot classification show that faster and more robust task adaptation is obtained in comparison with models with standard convolutions.

Related papers

Isomorphic Pruning for Vision Models [56.286064975443026]
Structured pruning reduces the computational overhead of deep neural networks by removing redundant sub-structures. We present Isomorphic Pruning, a simple approach that demonstrates effectiveness across a range of network architectures.
arXiv Detail & Related papers (2024-07-05T16:14:53Z)
On the rates of convergence for learning with convolutional neural networks [9.772773527230134]
We study approximation and learning capacities of convolutional neural networks (CNNs) with one-side zero-padding and multiple channels. We derive convergence rates for estimators based on CNNs in many learning problems. It is also shown that the obtained rates for classification are minimax optimal in some common settings.
arXiv Detail & Related papers (2024-03-25T06:42:02Z)
Adaptive aggregation of Monte Carlo augmented decomposed filters for efficient group-equivariant convolutional neural network [0.36122488107441414]
Group-equivariant convolutional neural networks (G-CNN) heavily rely on parameter sharing to increase CNN's data efficiency and performance. We propose a non- parameter-sharing approach for group equivariant neural networks. The proposed methods adaptively aggregate a diverse range of filters by a weighted sum of decomposedally augmented filters.
arXiv Detail & Related papers (2023-05-17T10:18:02Z)
Learning Partial Correlation based Deep Visual Representation for Image Classification [61.0532370259644]
We formulate sparse inverse covariance estimation (SICE) as a novel structured layer of CNN. Our work obtains a partial correlation based deep visual representation and mitigates the small sample problem. Experiments show the efficacy and superior classification performance of our model.
arXiv Detail & Related papers (2023-04-23T10:09:01Z)
The Power of Linear Combinations: Learning with Random Convolutions [2.0305676256390934]
Modern CNNs can achieve high test accuracies without ever updating randomly (spatial) convolution filters. These combinations of random filters can implicitly regularize the resulting operations. Although we only observe relatively small gains from learning $3times 3$ convolutions, the learning gains increase proportionally with kernel size.
arXiv Detail & Related papers (2023-01-26T19:17:10Z)
Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter. We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures'' Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z)
What Can Be Learnt With Wide Convolutional Neural Networks? [69.55323565255631]
We study infinitely-wide deep CNNs in the kernel regime. We prove that deep CNNs adapt to the spatial scale of the target function. We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN.
arXiv Detail & Related papers (2022-08-01T17:19:32Z)
Quantized convolutional neural networks through the lens of partial differential equations [6.88204255655161]
Quantization of Convolutional Neural Networks (CNNs) is a common approach to ease the computational burden involved in the deployment of CNNs. In this work, we explore ways to improve quantized CNNs using PDE-based perspective and analysis.
arXiv Detail & Related papers (2021-08-31T22:18:52Z)
Structured Convolutions for Efficient Neural Network Design [65.36569572213027]
We tackle model efficiency by exploiting redundancy in the textitimplicit structure of the building blocks of convolutional neural networks. We show how this decomposition can be applied to 2D and 3D kernels as well as the fully-connected layers.
arXiv Detail & Related papers (2020-08-06T04:38:38Z)
Learning Sparse Filters in Deep Convolutional Neural Networks with a l1/l2 Pseudo-Norm [5.3791844634527495]
Deep neural networks (DNNs) have proven to be efficient for numerous tasks, but come at a high memory and computation cost. Recent research has shown that their structure can be more compact without compromising their performance. We present a sparsity-inducing regularization term based on the ratio l1/l2 pseudo-norm defined on the filter coefficients.
arXiv Detail & Related papers (2020-07-20T11:56:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.