Knowledge Distillation Circumvents Nonlinearity for Optical
Convolutional Neural Networks
- URL: http://arxiv.org/abs/2102.13323v1
- Date: Fri, 26 Feb 2021 06:35:34 GMT
- Title: Knowledge Distillation Circumvents Nonlinearity for Optical
Convolutional Neural Networks
- Authors: Jinlin Xiang, Shane Colburn, Arka Majumdar, Eli Shlizerman
- Abstract summary: We propose a Spectral CNN Linear Counterpart (SCLC) network architecture and develop a Knowledge Distillation (KD) approach to circumvent the need for a nonlinearity.
We show that the KD approach can achieve performance that easily surpasses the standard linear version of a CNN and could approach the performance of the nonlinear network.
- Score: 4.683612295430957
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, Convolutional Neural Networks (CNNs) have enabled ubiquitous
image processing applications. As such, CNNs require fast runtime (forward
propagation) to process high-resolution visual streams in real time. This is
still a challenging task even with state-of-the-art graphics and tensor
processing units. The bottleneck in computational efficiency primarily occurs
in the convolutional layers. Performing operations in the Fourier domain is a
promising way to accelerate forward propagation since it transforms
convolutions into elementwise multiplications, which are considerably faster to
compute for large kernels. Furthermore, such computation could be implemented
using an optical 4f system with orders of magnitude faster operation. However,
a major challenge in using this spectral approach, as well as in an optical
implementation of CNNs, is the inclusion of a nonlinearity between each
convolutional layer, without which CNN performance drops dramatically. Here, we
propose a Spectral CNN Linear Counterpart (SCLC) network architecture and
develop a Knowledge Distillation (KD) approach to circumvent the need for a
nonlinearity and successfully train such networks. While the KD approach is
known in machine learning as an effective process for network pruning, we adapt
the approach to transfer the knowledge from a nonlinear network (teacher) to a
linear counterpart (student). We show that the KD approach can achieve
performance that easily surpasses the standard linear version of a CNN and
could approach the performance of the nonlinear network. Our simulations show
that the possibility of increasing the resolution of the input image allows our
proposed 4f optical linear network to perform more efficiently than a nonlinear
network with the same accuracy on two fundamental image processing tasks: (i)
object classification and (ii) semantic segmentation.
Related papers
- Training Large-Scale Optical Neural Networks with Two-Pass Forward Propagation [0.0]
This paper addresses the limitations in Optical Neural Networks (ONNs) related to training efficiency, nonlinear function implementation, and large input data processing.
We introduce Two-Pass Forward Propagation, a novel training method that avoids specific nonlinear activation functions by modulating and re-entering error with random noise.
We propose a new way to implement convolutional neural networks using simple neural networks in integrated optical systems.
arXiv Detail & Related papers (2024-08-15T11:27:01Z) - Algebraic Representations for Faster Predictions in Convolutional Neural Networks [0.0]
Convolutional neural networks (CNNs) are a popular choice of model for tasks in computer vision.
skip connections may be added to create an easier gradient optimization problem.
We show that arbitrarily complex, trained, linear CNNs with skip connections can be simplified into a single-layer model.
arXiv Detail & Related papers (2024-08-14T21:10:05Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Deep Multi-Threshold Spiking-UNet for Image Processing [51.88730892920031]
This paper introduces the novel concept of Spiking-UNet for image processing, which combines the power of Spiking Neural Networks (SNNs) with the U-Net architecture.
To achieve an efficient Spiking-UNet, we face two primary challenges: ensuring high-fidelity information propagation through the network via spikes and formulating an effective training strategy.
Experimental results show that, on image segmentation and denoising, our Spiking-UNet achieves comparable performance to its non-spiking counterpart.
arXiv Detail & Related papers (2023-07-20T16:00:19Z) - Approximation analysis of CNNs from a feature extraction view [8.94250977764275]
We establish some analysis for linear feature extraction by a deep multi-channel convolutional neural networks (CNNs)
We give an exact construction presenting how linear features extraction can be conducted efficiently with multi-channel CNNs.
Rates of function approximation by such deep networks implemented with channels and followed by fully-connected layers are investigated as well.
arXiv Detail & Related papers (2022-10-14T04:09:01Z) - GPU-Accelerated Machine Learning in Non-Orthogonal Multiple Access [71.58925117604039]
Non-orthogonal multiple access (NOMA) is an interesting technology that enables massive connectivity as required in future 5G and 6G networks.
We propose a neural network architecture that combines the advantages of both linear and non-linear processing.
arXiv Detail & Related papers (2022-06-13T09:38:23Z) - Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation.
We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters.
As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z) - Computational optimization of convolutional neural networks using
separated filters architecture [69.73393478582027]
We consider a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing.
Use of convolutional neural networks (CNN) is the standard approach to image recognition despite the fact they can be too computationally demanding.
arXiv Detail & Related papers (2020-02-18T17:42:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.