Transformed Low-Rank Parameterization Can Help Robust Generalization for
Tensor Neural Networks
- URL: http://arxiv.org/abs/2303.00196v3
- Date: Wed, 20 Dec 2023 08:57:18 GMT
- Title: Transformed Low-Rank Parameterization Can Help Robust Generalization for
Tensor Neural Networks
- Authors: Andong Wang, Chao Li, Mingyuan Bai, Zhong Jin, Guoxu Zhou, Qibin Zhao
- Abstract summary: tensor Singular Value Decomposition (t-SVD) has achieved extensive success in multi-channel data representation.
It still remains unclear how t-SVD theoretically affects the learning behavior of t-NNs.
This paper is the first to answer this question by deriving the upper bounds of the generalization error of both standard and adversarially trained t-NNs.
- Score: 32.87980654923361
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Achieving efficient and robust multi-channel data learning is a challenging
task in data science. By exploiting low-rankness in the transformed domain,
i.e., transformed low-rankness, tensor Singular Value Decomposition (t-SVD) has
achieved extensive success in multi-channel data representation and has
recently been extended to function representation such as Neural Networks with
t-product layers (t-NNs). However, it still remains unclear how t-SVD
theoretically affects the learning behavior of t-NNs. This paper is the first
to answer this question by deriving the upper bounds of the generalization
error of both standard and adversarially trained t-NNs. It reveals that the
t-NNs compressed by exact transformed low-rank parameterization can achieve a
sharper adversarial generalization bound. In practice, although t-NNs rarely
have exactly transformed low-rank weights, our analysis further shows that by
adversarial training with gradient flow (GF), the over-parameterized t-NNs with
ReLU activations are trained with implicit regularization towards transformed
low-rank parameterization under certain conditions. We also establish
adversarial generalization bounds for t-NNs with approximately transformed
low-rank weights. Our analysis indicates that the transformed low-rank
parameterization can promisingly enhance robust generalization for t-NNs.
Related papers
- An Efficient Approach to Regression Problems with Tensor Neural Networks [5.345144592056051]
This paper introduces a tensor neural network (TNN) to address nonparametric regression problems.
The TNN demonstrates superior performance compared to conventional Feed-Forward Networks (FFN) and Radial Basis Function Networks (RBN)
A significant innovation in our approach is the integration of statistical regression and numerical integration within the TNN framework.
arXiv Detail & Related papers (2024-06-14T03:38:40Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Converting Artificial Neural Networks to Spiking Neural Networks via
Parameter Calibration [21.117214351356765]
Spiking Neural Network (SNN) is recognized as one of the next-generation neural networks.
In this work, we argue that simply copying and pasting the weights of ANN to SNN inevitably results in activation mismatch.
We propose a set of layer-wise parameter calibration algorithms, which adjusts the parameters to minimize the activation mismatch.
arXiv Detail & Related papers (2022-05-06T18:22:09Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Revisiting Transformation Invariant Geometric Deep Learning: Are Initial
Representations All You Need? [80.86819657126041]
We show that transformation-invariant and distance-preserving initial representations are sufficient to achieve transformation invariance.
Specifically, we realize transformation-invariant and distance-preserving initial point representations by modifying multi-dimensional scaling.
We prove that TinvNN can strictly guarantee transformation invariance, being general and flexible enough to be combined with the existing neural networks.
arXiv Detail & Related papers (2021-12-23T03:52:33Z) - CAP: Co-Adversarial Perturbation on Weights and Features for Improving
Generalization of Graph Neural Networks [59.692017490560275]
Adversarial training has been widely demonstrated to improve model's robustness against adversarial attacks.
It remains unclear how the adversarial training could improve the generalization abilities of GNNs in the graph analytics problem.
We construct the co-adversarial perturbation (CAP) optimization problem in terms of weights and features, and design the alternating adversarial perturbation algorithm to flatten the weight and feature loss landscapes alternately.
arXiv Detail & Related papers (2021-10-28T02:28:13Z) - Block-term Tensor Neural Networks [29.442026567710435]
We show that block-term tensor layers (BT-layers) can be easily adapted to neural network models, such as CNNs and RNNs.
BT-layers in CNNs and RNNs can achieve a very large compression ratio on the number of parameters while preserving or improving the representation power of the original DNNs.
arXiv Detail & Related papers (2020-10-10T09:58:43Z) - Understanding Why Neural Networks Generalize Well Through GSNR of
Parameters [11.208337921488207]
We study gradient signal to noise ratio (GSNR) of parameters during training process of deep neural networks (DNNs)
We show that larger GSNR during training process leads to better generalization performance.
arXiv Detail & Related papers (2020-01-21T08:33:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.