Related papers: Transformed Low-Rank Parameterization Can Help Robust Generalization for Tensor Neural Networks

Transformed Low-Rank Parameterization Can Help Robust Generalization for Tensor Neural Networks

URL: http://arxiv.org/abs/2303.00196v3
Date: Wed, 20 Dec 2023 08:57:18 GMT
Title: Transformed Low-Rank Parameterization Can Help Robust Generalization for Tensor Neural Networks
Authors: Andong Wang, Chao Li, Mingyuan Bai, Zhong Jin, Guoxu Zhou, Qibin Zhao
Abstract summary: tensor Singular Value Decomposition (t-SVD) has achieved extensive success in multi-channel data representation. It still remains unclear how t-SVD theoretically affects the learning behavior of t-NNs. This paper is the first to answer this question by deriving the upper bounds of the generalization error of both standard and adversarially trained t-NNs.
Score: 32.87980654923361
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Achieving efficient and robust multi-channel data learning is a challenging task in data science. By exploiting low-rankness in the transformed domain, i.e., transformed low-rankness, tensor Singular Value Decomposition (t-SVD) has achieved extensive success in multi-channel data representation and has recently been extended to function representation such as Neural Networks with t-product layers (t-NNs). However, it still remains unclear how t-SVD theoretically affects the learning behavior of t-NNs. This paper is the first to answer this question by deriving the upper bounds of the generalization error of both standard and adversarially trained t-NNs. It reveals that the t-NNs compressed by exact transformed low-rank parameterization can achieve a sharper adversarial generalization bound. In practice, although t-NNs rarely have exactly transformed low-rank weights, our analysis further shows that by adversarial training with gradient flow (GF), the over-parameterized t-NNs with ReLU activations are trained with implicit regularization towards transformed low-rank parameterization under certain conditions. We also establish adversarial generalization bounds for t-NNs with approximately transformed low-rank weights. Our analysis indicates that the transformed low-rank parameterization can promisingly enhance robust generalization for t-NNs.

Related papers

Pruning Deep Neural Networks via a Combination of the Marchenko-Pastur Distribution and Regularization [0.18641315013048293]
Vision Transformers (ViTs) have emerged as a powerful class of models in the field of deep learning for image classification. We propose a novel Random Matrix Theory (RMT)-based method for pruning pre-trained DNNs, based on the sparsification of weights and singular vectors. We demonstrate that our RMT-based pruning can be used to reduce the number of parameters of ViT models by 30-50% with less than 1% loss in accuracy.
arXiv Detail & Related papers (2025-03-02T05:25:20Z)
Deep-Unrolling Multidimensional Harmonic Retrieval Algorithms on Neuromorphic Hardware [78.17783007774295]
This paper explores the potential of conversion-based neuromorphic algorithms for highly accurate and energy-efficient single-snapshot multidimensional harmonic retrieval. A novel method for converting the complex-valued convolutional layers and activations into spiking neural networks (SNNs) is developed. The converted SNNs achieve almost five-fold power efficiency at moderate performance loss compared to the original CNNs.
arXiv Detail & Related papers (2024-12-05T09:41:33Z)
Temporal Reversal Regularization for Spiking Neural Networks: Hybrid Spatio-Temporal Invariance for Generalization [3.7748662901422807]
Spiking neural networks (SNNs) have received widespread attention as an ultra-low power computing paradigm. Recent studies have shown that SNNs suffer from severe overfitting, which limits their generalization performance. We propose a simple yet effective Temporal Reversal Regularization to mitigate overfitting during training and facilitate generalization of SNNs.
arXiv Detail & Related papers (2024-08-17T06:23:38Z)
An Efficient Approach to Regression Problems with Tensor Neural Networks [5.345144592056051]
This paper introduces a tensor neural network (TNN) to address nonparametric regression problems. The TNN demonstrates superior performance compared to conventional Feed-Forward Networks (FFN) and Radial Basis Function Networks (RBN) A significant innovation in our approach is the integration of statistical regression and numerical integration within the TNN framework.
arXiv Detail & Related papers (2024-06-14T03:38:40Z)
Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification. Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z)
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory. Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z)
Converting Artificial Neural Networks to Spiking Neural Networks via Parameter Calibration [21.117214351356765]
Spiking Neural Network (SNN) is recognized as one of the next-generation neural networks. In this work, we argue that simply copying and pasting the weights of ANN to SNN inevitably results in activation mismatch. We propose a set of layer-wise parameter calibration algorithms, which adjusts the parameters to minimize the activation mismatch.
arXiv Detail & Related papers (2022-05-06T18:22:09Z)
On Feature Learning in Neural Networks with Global Convergence Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF) We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF. We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z)
Comparative Analysis of Interval Reachability for Robust Implicit and Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs) INNs are a class of implicit learning models that use implicit equations as layers. We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z)
Revisiting Transformation Invariant Geometric Deep Learning: Are Initial Representations All You Need? [80.86819657126041]
We show that transformation-invariant and distance-preserving initial representations are sufficient to achieve transformation invariance. Specifically, we realize transformation-invariant and distance-preserving initial point representations by modifying multi-dimensional scaling. We prove that TinvNN can strictly guarantee transformation invariance, being general and flexible enough to be combined with the existing neural networks.
arXiv Detail & Related papers (2021-12-23T03:52:33Z)
CAP: Co-Adversarial Perturbation on Weights and Features for Improving Generalization of Graph Neural Networks [59.692017490560275]
Adversarial training has been widely demonstrated to improve model's robustness against adversarial attacks. It remains unclear how the adversarial training could improve the generalization abilities of GNNs in the graph analytics problem. We construct the co-adversarial perturbation (CAP) optimization problem in terms of weights and features, and design the alternating adversarial perturbation algorithm to flatten the weight and feature loss landscapes alternately.
arXiv Detail & Related papers (2021-10-28T02:28:13Z)
Block-term Tensor Neural Networks [29.442026567710435]
We show that block-term tensor layers (BT-layers) can be easily adapted to neural network models, such as CNNs and RNNs. BT-layers in CNNs and RNNs can achieve a very large compression ratio on the number of parameters while preserving or improving the representation power of the original DNNs.
arXiv Detail & Related papers (2020-10-10T09:58:43Z)
Understanding Why Neural Networks Generalize Well Through GSNR of Parameters [11.208337921488207]
We study gradient signal to noise ratio (GSNR) of parameters during training process of deep neural networks (DNNs) We show that larger GSNR during training process leads to better generalization performance.
arXiv Detail & Related papers (2020-01-21T08:33:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.