Related papers: Acceleration of Convolutional Neural Network Using FFT-Based Split Convolutions

Acceleration of Convolutional Neural Network Using FFT-Based Split Convolutions

URL: http://arxiv.org/abs/2003.12621v2
Date: Fri, 3 Apr 2020 21:14:18 GMT
Title: Acceleration of Convolutional Neural Network Using FFT-Based Split Convolutions
Authors: Kamran Chitsaz, Mohsen Hajabdollahi, Nader Karimi, Shadrokh Samavi, Shahram Shirani
Abstract summary: Convolutional neural networks (CNNs) have a large number of variables and hence suffer from a complexity problem for their implementation. Recent studies on Fast Fourier Transform (FFT) based CNN aiming at simplifying the computations required for FFT. In this paper, a new method for CNN processing in the FFT domain is proposed, which is based on input splitting.
Score: 11.031841470875571
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Convolutional neural networks (CNNs) have a large number of variables and hence suffer from a complexity problem for their implementation. Different methods and techniques have developed to alleviate the problem of CNN's complexity, such as quantization, pruning, etc. Among the different simplification methods, computation in the Fourier domain is regarded as a new paradigm for the acceleration of CNNs. Recent studies on Fast Fourier Transform (FFT) based CNN aiming at simplifying the computations required for FFT. However, there is a lot of space for working on the reduction of the computational complexity of FFT. In this paper, a new method for CNN processing in the FFT domain is proposed, which is based on input splitting. There are problems in the computation of FFT using small kernels in situations such as CNN. Splitting can be considered as an effective solution for such issues aroused by small kernels. Using splitting redundancy, such as overlap-and-add, is reduced and, efficiency is increased. Hardware implementation of the proposed FFT method, as well as different analyses of the complexity, are performed to demonstrate the proper performance of the proposed method.

Related papers

Deep-Unrolling Multidimensional Harmonic Retrieval Algorithms on Neuromorphic Hardware [78.17783007774295]
This paper explores the potential of conversion-based neuromorphic algorithms for highly accurate and energy-efficient single-snapshot multidimensional harmonic retrieval. A novel method for converting the complex-valued convolutional layers and activations into spiking neural networks (SNNs) is developed. The converted SNNs achieve almost five-fold power efficiency at moderate performance loss compared to the original CNNs.
arXiv Detail & Related papers (2024-12-05T09:41:33Z)
Multilevel CNNs for Parametric PDEs based on Adaptive Finite Elements [0.0]
A neural network architecture is presented that exploits the multilevel properties of high-dimensional parameter-dependent partial differential equations. The network is trained with data on adaptively refined finite element meshes. A complete convergence and complexity analysis is carried out for the adaptive multilevel scheme.
arXiv Detail & Related papers (2024-08-20T13:32:11Z)
Domain Agnostic Fourier Neural Operators [15.29112632863168]
We introduce domain agnostic Fourier neural operator (DAFNO) for learning surrogates with irregular geometries and evolving domains. The key idea is to incorporate a smoothed characteristic function in the integral layer architecture of FNOs. DAFNO has achieved state-of-the-art accuracy as compared to baseline neural operator models.
arXiv Detail & Related papers (2023-04-30T13:29:06Z)
Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data. In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z)
Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time. This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z)
Receptive Field-based Segmentation for Distributed CNN Inference Acceleration in Collaborative Edge Computing [93.67044879636093]
We study inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing network. We propose a novel collaborative edge computing using fused-layer parallelization to partition a CNN model into multiple blocks of convolutional layers.
arXiv Detail & Related papers (2022-07-22T18:38:11Z)
Multi-objective Evolutionary Approach for Efficient Kernel Size and Shape for CNN [12.697368516837718]
State-of-the-art development in CNN topology, such as VGGNet and ResNet, have become increasingly accurate. These networks are computationally expensive involving billions of arithmetic operations and parameters. This paper considers optimising the computational resource consumption by reducing the size and number of kernels in convolutional layers.
arXiv Detail & Related papers (2021-06-28T14:47:29Z)
Learning to Solve the AC-OPF using Sensitivity-Informed Deep Neural Networks [52.32646357164739]
We propose a deep neural network (DNN) to solve the solutions of the optimal power flow (ACOPF) The proposed SIDNN is compatible with a broad range of OPF schemes. It can be seamlessly integrated in other learning-to-OPF schemes.
arXiv Detail & Related papers (2021-03-27T00:45:23Z)
Fast Partial Fourier Transform [28.36925669222461]
Fast Fourier transform (FFT) is a widely used algorithm that computes the discrete Fourier transform in many machine learning applications. Despite its pervasive use, all known FFT algorithms do not provide a fine-tuning option for the user to specify one's demand. In this paper, we propose a fast Partial Fourier Transform (PFT), a careful modification of the Cooley-Tukey algorithm that enables one to specify an arbitrary consecutive range where the coefficients should be computed.
arXiv Detail & Related papers (2020-08-28T10:01:49Z)
ALF: Autoencoder-based Low-rank Filter-sharing for Efficient Convolutional Neural Networks [63.91384986073851]
We propose the autoencoder-based low-rank filter-sharing technique technique (ALF) ALF shows a reduction of 70% in network parameters, 61% in operations and 41% in execution time, with minimal loss in accuracy.
arXiv Detail & Related papers (2020-07-27T09:01:22Z)
Computational optimization of convolutional neural networks using separated filters architecture [69.73393478582027]
We consider a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing. Use of convolutional neural networks (CNN) is the standard approach to image recognition despite the fact they can be too computationally demanding.
arXiv Detail & Related papers (2020-02-18T17:42:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.