Improving the Expressive Power of Deep Neural Networks through Integral
Activation Transform
- URL: http://arxiv.org/abs/2312.12578v1
- Date: Tue, 19 Dec 2023 20:23:33 GMT
- Title: Improving the Expressive Power of Deep Neural Networks through Integral
Activation Transform
- Authors: Zezhong Zhang, Feng Bao, Guannan Zhang
- Abstract summary: We generalize the traditional fully connected deep neural network (DNN) through the concept of continuous width.
We show that IAT-ReLU exhibits a continuous activation pattern when continuous basis functions are employed.
Our numerical experiments demonstrate that IAT-ReLU outperforms regular ReLU in terms of trainability and better smoothness.
- Score: 12.36064367319084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The impressive expressive power of deep neural networks (DNNs) underlies
their widespread applicability. However, while the theoretical capacity of deep
architectures is high, the practical expressive power achieved through
successful training often falls short. Building on the insights gained from
Neural ODEs, which explore the depth of DNNs as a continuous variable, in this
work, we generalize the traditional fully connected DNN through the concept of
continuous width. In the Generalized Deep Neural Network (GDNN), the
traditional notion of neurons in each layer is replaced by a continuous state
function. Using the finite rank parameterization of the weight integral kernel,
we establish that GDNN can be obtained by employing the Integral Activation
Transform (IAT) as activation layers within the traditional DNN framework. The
IAT maps the input vector to a function space using some basis functions,
followed by nonlinear activation in the function space, and then extracts
information through the integration with another collection of basis functions.
A specific variant, IAT-ReLU, featuring the ReLU nonlinearity, serves as a
smooth generalization of the scalar ReLU activation. Notably, IAT-ReLU exhibits
a continuous activation pattern when continuous basis functions are employed,
making it smooth and enhancing the trainability of the DNN. Our numerical
experiments demonstrate that IAT-ReLU outperforms regular ReLU in terms of
trainability and better smoothness.
Related papers
- Sign Gradient Descent-based Neuronal Dynamics: ANN-to-SNN Conversion Beyond ReLU Network [10.760652747217668]
Spiking neural network (SNN) is studied in multidisciplinary domains to simulate neuro-scientific mechanisms.
The lack of discrete theory obstructs the practical application of SNN by limiting its performance and nonlinearity support.
We present a new optimization-theoretic perspective of the discrete dynamics of spiking neurons.
arXiv Detail & Related papers (2024-07-01T02:09:20Z) - ReLUs Are Sufficient for Learning Implicit Neural Representations [17.786058035763254]
We revisit the use of ReLU activation functions for learning implicit neural representations.
Inspired by second order B-spline wavelets, we incorporate a set of simple constraints to the ReLU neurons in each layer of a deep neural network (DNN)
We demonstrate that, contrary to popular belief, one can learn state-of-the-art INRs based on a DNN composed of only ReLU neurons.
arXiv Detail & Related papers (2024-06-04T17:51:08Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp)
In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks.
We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Statistical Mechanics of Deep Linear Neural Networks: The
Back-Propagating Renormalization Group [4.56877715768796]
We study the statistical mechanics of learning in Deep Linear Neural Networks (DLNNs) in which the input-output function of an individual unit is linear.
We solve exactly the network properties following supervised learning using an equilibrium Gibbs distribution in the weight space.
Our numerical simulations reveal that despite the nonlinearity, the predictions of our theory are largely shared by ReLU networks with modest depth.
arXiv Detail & Related papers (2020-12-07T20:08:31Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.