A Sparse Coding Interpretation of Neural Networks and Theoretical
Implications
- URL: http://arxiv.org/abs/2108.06622v2
- Date: Wed, 18 Aug 2021 16:43:46 GMT
- Title: A Sparse Coding Interpretation of Neural Networks and Theoretical
Implications
- Authors: Joshua Bowren
- Abstract summary: Deep convolutional neural networks have achieved unprecedented performance in various computer vision tasks.
We propose a sparse coding interpretation of neural networks that have ReLU activation.
We derive a complete convolutional neural network without normalization and pooling.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural networks, specifically deep convolutional neural networks, have
achieved unprecedented performance in various computer vision tasks, but the
rationale for the computations and structures of successful neural networks is
not fully understood. Theories abound for the aptitude of convolutional neural
networks for image classification, but less is understood about why such models
would be capable of complex visual tasks such as inference and anomaly
identification. Here, we propose a sparse coding interpretation of neural
networks that have ReLU activation and of convolutional neural networks in
particular. In sparse coding, when the model's basis functions are assumed to
be orthogonal, the optimal coefficients are given by the soft-threshold
function of the basis functions projected onto the input image. In a
non-negative variant of sparse coding, the soft-threshold function becomes a
ReLU. Here, we derive these solutions via sparse coding with orthogonal-assumed
basis functions, then we derive the convolutional neural network forward
transformation from a modified non-negative orthogonal sparse coding model with
an exponential prior parameter for each sparse coding coefficient. Next, we
derive a complete convolutional neural network without normalization and
pooling by adding logistic regression to a hierarchical sparse coding model.
Finally we motivate potentially more robust forward transformations by
maintaining sparse priors in convolutional neural networks as well performing a
stronger nonlinear transformation.
Related papers
- Erasure Coded Neural Network Inference via Fisher Averaging [28.243239815823205]
Erasure-coded computing has been successfully used in cloud systems to reduce tail latency caused by factors such as straggling servers and heterogeneous traffic variations.
We design a method to code over neural networks, that is, given two or more neural network models, how to construct a coded model whose output is a linear combination of the outputs of the given neural networks.
We conduct experiments to perform erasure coding over neural networks trained on real-world vision datasets and show that the accuracy of the decoded outputs using COIN is significantly higher than other baselines.
arXiv Detail & Related papers (2024-09-02T18:46:26Z) - Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Permutation Equivariant Neural Functionals [92.0667671999604]
This work studies the design of neural networks that can process the weights or gradients of other neural networks.
We focus on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order.
In our experiments, we find that permutation equivariant neural functionals are effective on a diverse set of tasks.
arXiv Detail & Related papers (2023-02-27T18:52:38Z) - Spiking neural network for nonlinear regression [68.8204255655161]
Spiking neural networks carry the potential for a massive reduction in memory and energy consumption.
They introduce temporal and neuronal sparsity, which can be exploited by next-generation neuromorphic hardware.
A framework for regression using spiking neural networks is proposed.
arXiv Detail & Related papers (2022-10-06T13:04:45Z) - Optimal Learning Rates of Deep Convolutional Neural Networks: Additive
Ridge Functions [19.762318115851617]
We consider the mean squared error analysis for deep convolutional neural networks.
We show that, for additive ridge functions, convolutional neural networks followed by one fully connected layer with ReLU activation functions can reach optimal mini-max rates.
arXiv Detail & Related papers (2022-02-24T14:22:32Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Deep Kronecker neural networks: A general framework for neural networks
with adaptive activation functions [4.932130498861987]
We propose a new type of neural networks, Kronecker neural networks (KNNs), that form a general framework for neural networks with adaptive activation functions.
Under suitable conditions, KNNs induce a faster decay of the loss than that by the feed-forward networks.
arXiv Detail & Related papers (2021-05-20T04:54:57Z) - Modeling the Nonsmoothness of Modern Neural Networks [35.93486244163653]
We quantify the nonsmoothness using a feature named the sum of the magnitude of peaks (SMP)
We envision that the nonsmoothness feature can potentially be used as a forensic tool for regression-based applications of neural networks.
arXiv Detail & Related papers (2021-03-26T20:55:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.