Knowledge Distillation Circumvents Nonlinearity for Optical
Convolutional Neural Networks
- URL: http://arxiv.org/abs/2102.13323v1
- Date: Fri, 26 Feb 2021 06:35:34 GMT
- Title: Knowledge Distillation Circumvents Nonlinearity for Optical
Convolutional Neural Networks
- Authors: Jinlin Xiang, Shane Colburn, Arka Majumdar, Eli Shlizerman
- Abstract summary: We propose a Spectral CNN Linear Counterpart (SCLC) network architecture and develop a Knowledge Distillation (KD) approach to circumvent the need for a nonlinearity.
We show that the KD approach can achieve performance that easily surpasses the standard linear version of a CNN and could approach the performance of the nonlinear network.
- Score: 4.683612295430957
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, Convolutional Neural Networks (CNNs) have enabled ubiquitous
image processing applications. As such, CNNs require fast runtime (forward
propagation) to process high-resolution visual streams in real time. This is
still a challenging task even with state-of-the-art graphics and tensor
processing units. The bottleneck in computational efficiency primarily occurs
in the convolutional layers. Performing operations in the Fourier domain is a
promising way to accelerate forward propagation since it transforms
convolutions into elementwise multiplications, which are considerably faster to
compute for large kernels. Furthermore, such computation could be implemented
using an optical 4f system with orders of magnitude faster operation. However,
a major challenge in using this spectral approach, as well as in an optical
implementation of CNNs, is the inclusion of a nonlinearity between each
convolutional layer, without which CNN performance drops dramatically. Here, we
propose a Spectral CNN Linear Counterpart (SCLC) network architecture and
develop a Knowledge Distillation (KD) approach to circumvent the need for a
nonlinearity and successfully train such networks. While the KD approach is
known in machine learning as an effective process for network pruning, we adapt
the approach to transfer the knowledge from a nonlinear network (teacher) to a
linear counterpart (student). We show that the KD approach can achieve
performance that easily surpasses the standard linear version of a CNN and
could approach the performance of the nonlinear network. Our simulations show
that the possibility of increasing the resolution of the input image allows our
proposed 4f optical linear network to perform more efficiently than a nonlinear
network with the same accuracy on two fundamental image processing tasks: (i)
object classification and (ii) semantic segmentation.
Related papers
- TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Deep Multi-Threshold Spiking-UNet for Image Processing [51.88730892920031]
This paper introduces the novel concept of Spiking-UNet for image processing, which combines the power of Spiking Neural Networks (SNNs) with the U-Net architecture.
To achieve an efficient Spiking-UNet, we face two primary challenges: ensuring high-fidelity information propagation through the network via spikes and formulating an effective training strategy.
Experimental results show that, on image segmentation and denoising, our Spiking-UNet achieves comparable performance to its non-spiking counterpart.
arXiv Detail & Related papers (2023-07-20T16:00:19Z) - Approximation analysis of CNNs from a feature extraction view [8.94250977764275]
We establish some analysis for linear feature extraction by a deep multi-channel convolutional neural networks (CNNs)
We give an exact construction presenting how linear features extraction can be conducted efficiently with multi-channel CNNs.
Rates of function approximation by such deep networks implemented with channels and followed by fully-connected layers are investigated as well.
arXiv Detail & Related papers (2022-10-14T04:09:01Z) - GPU-Accelerated Machine Learning in Non-Orthogonal Multiple Access [71.58925117604039]
Non-orthogonal multiple access (NOMA) is an interesting technology that enables massive connectivity as required in future 5G and 6G networks.
We propose a neural network architecture that combines the advantages of both linear and non-linear processing.
arXiv Detail & Related papers (2022-06-13T09:38:23Z) - Monolithic Silicon Photonic Architecture for Training Deep Neural
Networks with Direct Feedback Alignment [0.6501025489527172]
We propose on-chip training of neural networks enabled by a CMOS-compatible silicon photonic architecture.
Our scheme employs the direct feedback alignment training algorithm, which trains neural networks using error feedback rather than error backpropagation.
We experimentally demonstrate training a deep neural network with the MNIST dataset using on-chip MAC operation results.
arXiv Detail & Related papers (2021-11-12T18:31:51Z) - The Surprising Simplicity of the Early-Time Learning Dynamics of Neural
Networks [43.860358308049044]
In work, we show that these common perceptions can be completely false in the early phase of learning.
We argue that this surprising simplicity can persist in networks with more layers with convolutional architecture.
arXiv Detail & Related papers (2020-06-25T17:42:49Z) - Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation.
We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters.
As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z) - Computational optimization of convolutional neural networks using
separated filters architecture [69.73393478582027]
We consider a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing.
Use of convolutional neural networks (CNN) is the standard approach to image recognition despite the fact they can be too computationally demanding.
arXiv Detail & Related papers (2020-02-18T17:42:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.