Dense Pruning of Pointwise Convolutions in the Frequency Domain
- URL: http://arxiv.org/abs/2109.07707v1
- Date: Thu, 16 Sep 2021 04:02:45 GMT
- Title: Dense Pruning of Pointwise Convolutions in the Frequency Domain
- Authors: Mark Buckler, Neil Adit, Yuwei Hu, Zhiru Zhang, and Adrian Sampson
- Abstract summary: We propose a technique which wraps each pointwise layer in a discrete cosine transform (DCT) which is truncated to selectively prune coefficients above a given threshold.
Unlike weight pruning techniques which rely on sparse operators, our contiguous frequency band pruning results in fully dense computation.
We apply our technique to MobileNetV2 and in the process reduce computation time by 22% and incur 1% accuracy degradation.
- Score: 10.58456555092086
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Depthwise separable convolutions and frequency-domain convolutions are two
recent ideas for building efficient convolutional neural networks. They are
seemingly incompatible: the vast majority of operations in depthwise separable
CNNs are in pointwise convolutional layers, but pointwise layers use 1x1
kernels, which do not benefit from frequency transformation. This paper unifies
these two ideas by transforming the activations, not the kernels. Our key
insights are that 1) pointwise convolutions commute with frequency
transformation and thus can be computed in the frequency domain without
modification, 2) each channel within a given layer has a different level of
sensitivity to frequency domain pruning, and 3) each channel's sensitivity to
frequency pruning is approximately monotonic with respect to frequency. We
leverage this knowledge by proposing a new technique which wraps each pointwise
layer in a discrete cosine transform (DCT) which is truncated to selectively
prune coefficients above a given threshold as per the needs of each channel. To
learn which frequencies should be pruned from which channels, we introduce a
novel learned parameter which specifies each channel's pruning threshold. We
add a new regularization term which incentivizes the model to decrease the
number of retained frequencies while still maintaining task accuracy. Unlike
weight pruning techniques which rely on sparse operators, our contiguous
frequency band pruning results in fully dense computation. We apply our
technique to MobileNetV2 and in the process reduce computation time by 22% and
incur <1% accuracy degradation.
Related papers
- Accelerating Inference of Networks in the Frequency Domain [8.125023712173686]
We propose performing network inference in the frequency domain to speed up networks whose frequency parameters are sparse.
In particular, we propose a frequency inference chain that is dual to the network inference in the spatial domain.
The proposed approach significantly improves accuracy in the case of a high speedup ratio (over 100x)
arXiv Detail & Related papers (2024-10-06T03:34:38Z) - A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree
Spectral Bias of Neural Networks [79.28094304325116]
Despite the capacity of neural nets to learn arbitrary functions, models trained through gradient descent often exhibit a bias towards simpler'' functions.
We show how this spectral bias towards low-degree frequencies can in fact hurt the neural network's generalization on real-world datasets.
We propose a new scalable functional regularization scheme that aids the neural network to learn higher degree frequencies.
arXiv Detail & Related papers (2023-05-16T20:06:01Z) - Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time.
This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z) - Adaptive Frequency Learning in Two-branch Face Forgery Detection [66.91715092251258]
We propose Adaptively learn Frequency information in the two-branch Detection framework, dubbed AFD.
We liberate our network from the fixed frequency transforms, and achieve better performance with our data- and task-dependent transform layers.
arXiv Detail & Related papers (2022-03-27T14:25:52Z) - Group Fisher Pruning for Practical Network Compression [58.25776612812883]
We present a general channel pruning approach that can be applied to various complicated structures.
We derive a unified metric based on Fisher information to evaluate the importance of a single channel and coupled channels.
Our method can be used to prune any structures including those with coupled channels.
arXiv Detail & Related papers (2021-08-02T08:21:44Z) - SubSpectral Normalization for Neural Audio Data Processing [11.97844299450951]
We introduce SubSpectral Normalization (SSN) which splits the input frequency dimension into several groups (sub-bands) and performs a different normalization for each group.
Our method removes the inter-frequency deflection while the network learns a frequency-aware characteristic.
In the experiments with audio data, we observed that SSN can efficiently improve the network's performance.
arXiv Detail & Related papers (2021-03-25T05:55:48Z) - Learning Frequency Domain Approximation for Binary Neural Networks [68.79904499480025]
We propose to estimate the gradient of sign function in the Fourier frequency domain using the combination of sine functions for training BNNs.
The experiments on several benchmark datasets and neural architectures illustrate that the binary network learned using our method achieves the state-of-the-art accuracy.
arXiv Detail & Related papers (2021-03-01T08:25:26Z) - A Sub-band Approach to Deep Denoising Wavelet Networks and a
Frequency-adaptive Loss for Perceptual Quality [0.0]
We show that our approach to using DWT in neural networks improves the accuracy notably.
Our second contribution is a denoising loss based on top k-percent of errors in frequency domain.
arXiv Detail & Related papers (2021-02-16T06:35:42Z) - Frequency Gating: Improved Convolutional Neural Networks for Speech
Enhancement in the Time-Frequency Domain [37.722450363816144]
We introduce a method, which we call Frequency Gating, to compute multiplicative weights for the kernels of the CNN.
Experiments with an autoencoder neural network with skip connections show that both local and frequency-wise gating outperform the baseline.
A loss function based on the extended short-time objective intelligibility score (ESTOI) is introduced, which we show to outperform the standard mean squared error (MSE) loss function.
arXiv Detail & Related papers (2020-11-08T22:04:00Z) - Robust Learning with Frequency Domain Regularization [1.370633147306388]
We introduce a new regularization method by constraining the frequency spectra of the filter of the model.
We demonstrate the effectiveness of our regularization by (1) defensing to adversarial perturbations; (2) reducing the generalization gap in different architecture; and (3) improving the generalization ability in transfer learning scenario without fine-tune.
arXiv Detail & Related papers (2020-07-07T07:29:20Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.