A Unified Weight Initialization Paradigm for Tensorial Convolutional
Neural Networks
- URL: http://arxiv.org/abs/2205.15307v1
- Date: Sat, 28 May 2022 13:31:24 GMT
- Title: A Unified Weight Initialization Paradigm for Tensorial Convolutional
Neural Networks
- Authors: Yu Pan, Zeyong Su, Ao Liu, Jingquan Wang, Nannan Li, Zenglin Xu
- Abstract summary: Convolutional Neural Networks (TCNNs) have attracted much research attention for their power in reducing model parameters or enhancing the ability.
exploration of TCNNs is hindered even from weight cleanup methods.
We propose a universal weight cleanup paradigm, which generalizes Xavier and Kaiming methods and can be widely applicable to arbitrary TCNNs.
Our paradigm can stabilize the training of TCNNs, leading to faster convergence and better results.
- Score: 17.71332705005499
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tensorial Convolutional Neural Networks (TCNNs) have attracted much research
attention for their power in reducing model parameters or enhancing the
generalization ability. However, exploration of TCNNs is hindered even from
weight initialization methods. To be specific, general initialization methods,
such as Xavier or Kaiming initialization, usually fail to generate appropriate
weights for TCNNs. Meanwhile, although there are ad-hoc approaches for specific
architectures (e.g., Tensor Ring Nets), they are not applicable to TCNNs with
other tensor decomposition methods (e.g., CP or Tucker decomposition). To
address this problem, we propose a universal weight initialization paradigm,
which generalizes Xavier and Kaiming methods and can be widely applicable to
arbitrary TCNNs. Specifically, we first present the Reproducing Transformation
to convert the backward process in TCNNs to an equivalent convolution process.
Then, based on the convolution operators in the forward and backward processes,
we build a unified paradigm to control the variance of features and gradients
in TCNNs. Thus, we can derive fan-in and fan-out initialization for various
TCNNs. We demonstrate that our paradigm can stabilize the training of TCNNs,
leading to faster convergence and better results.
Related papers
- Principled Weight Initialisation for Input-Convex Neural Networks [1.949679629562811]
Input-Convex Neural Networks (ICNNs) guarantee convexity in their input-output mapping.
Previous initialisation strategies, which implicitly assume centred weights, are not effective for ICNNs.
We show that our principled initialisation effectively accelerates learning in ICNNs and leads to better generalisation.
arXiv Detail & Related papers (2023-12-19T10:36:12Z) - On the Initialization of Graph Neural Networks [10.153841274798829]
We analyze the variance of forward and backward propagation across Graph Neural Networks layers.
We propose a new method for Variance Instability Reduction within GNN Optimization (Virgo)
We conduct comprehensive experiments on 15 datasets to show that Virgo can lead to superior model performance.
arXiv Detail & Related papers (2023-12-05T09:55:49Z) - Transformed Low-Rank Parameterization Can Help Robust Generalization for
Tensor Neural Networks [32.87980654923361]
tensor Singular Value Decomposition (t-SVD) has achieved extensive success in multi-channel data representation.
It still remains unclear how t-SVD theoretically affects the learning behavior of t-NNs.
This paper is the first to answer this question by deriving the upper bounds of the generalization error of both standard and adversarially trained t-NNs.
arXiv Detail & Related papers (2023-03-01T03:05:40Z) - Quantum-Inspired Tensor Neural Networks for Option Pricing [4.3942901219301564]
Recent advances in deep learning have enabled us to address the curse of dimensionality (COD) by solving problems in higher dimensions.
A subset of such approaches of addressing the COD has led us to solving high-dimensional PDEs.
This has resulted in opening doors to solving a variety of real-world problems ranging from mathematical finance to control for industrial applications.
arXiv Detail & Related papers (2022-12-28T19:39:55Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Revisiting Transformation Invariant Geometric Deep Learning: Are Initial
Representations All You Need? [80.86819657126041]
We show that transformation-invariant and distance-preserving initial representations are sufficient to achieve transformation invariance.
Specifically, we realize transformation-invariant and distance-preserving initial point representations by modifying multi-dimensional scaling.
We prove that TinvNN can strictly guarantee transformation invariance, being general and flexible enough to be combined with the existing neural networks.
arXiv Detail & Related papers (2021-12-23T03:52:33Z) - Orthogonal Graph Neural Networks [53.466187667936026]
Graph neural networks (GNNs) have received tremendous attention due to their superiority in learning node representations.
stacking more convolutional layers significantly decreases the performance of GNNs.
We propose a novel Ortho-GConv, which could generally augment the existing GNN backbones to stabilize the model training and improve the model's generalization performance.
arXiv Detail & Related papers (2021-09-23T12:39:01Z) - Optimal Conversion of Conventional Artificial Neural Networks to Spiking
Neural Networks [0.0]
Spiking neural networks (SNNs) are biology-inspired artificial neural networks (ANNs)
We propose a novel strategic pipeline that transfers the weights to the target SNN by combining threshold balance and soft-reset mechanisms.
Our method is promising to get implanted onto embedded platforms with better support of SNNs with limited energy and memory.
arXiv Detail & Related papers (2021-02-28T12:04:22Z) - Overcoming Catastrophic Forgetting in Graph Neural Networks [50.900153089330175]
Catastrophic forgetting refers to the tendency that a neural network "forgets" the previous learned knowledge upon learning new tasks.
We propose a novel scheme dedicated to overcoming this problem and hence strengthen continual learning in graph neural networks (GNNs)
At the heart of our approach is a generic module, termed as topology-aware weight preserving(TWP)
arXiv Detail & Related papers (2020-12-10T22:30:25Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.