Related papers: Efficient Finite Initialization for Tensorized Neural Networks

Efficient Finite Initialization for Tensorized Neural Networks

URL: http://arxiv.org/abs/2309.06577v3
Date: Fri, 04 Oct 2024 11:25:21 GMT
Title: Efficient Finite Initialization for Tensorized Neural Networks
Authors: Alejandro Mata Ali, Iñigo Perez Delgado, Marina Ristol Roura, Aitor Moreno Fdez. de Leceta,
Abstract summary: We present a novel method for initializing layers of tensorized neural networks in a way that avoids the explosion of the parameters of the matrix it emulates. We create a Python function to run it on an arbitrary layer, available in a Jupyter Notebook in the i3BQuantum repository.
Score: 41.94295877935867
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a novel method for initializing layers of tensorized neural networks in a way that avoids the explosion of the parameters of the matrix it emulates. The method is intended for layers with a high number of nodes in which there is a connection to the input or output of all or most of the nodes, we cannot or do not want to store/calculate all the elements of the represented layer and they follow a smooth distribution. This method is equally applicable to normalize general tensor networks in which we want to avoid overflows. The core of this method is the use of the Frobenius norm and the partial lineal entrywise norm of reduced forms of the layer in an iterative partial form, so that it has to be finite and within a certain range. These norms are efficient to compute, fully or partially for most cases of interest. In addition, the method benefits from the reuse of intermediate calculations. We apply the method to different layers and check its performance. We create a Python function to run it on an arbitrary layer, available in a Jupyter Notebook in the i3BQuantum repository: https://github.com/i3BQuantumTeam/Q4Real/blob/e07c827651ef16bcf74590ab965ea3985143f891/Quantum-Inspi red%20Variational%20Methods/TN_Normalizer.ipynb

Related papers

Prime Factorization Equation from a Tensor Network Perspective [37.037023521034925]
This paper presents an exact and explicit equation for prime factorization, along with an algorithm for its computation.<n>The proposed method is based on the MeLoCoToN approach, which addresses optimization problems through classical tensor networks.
arXiv Detail & Related papers (2025-07-29T10:38:51Z)
BALI: Learning Neural Networks via Bayesian Layerwise Inference [6.7819070167076045]
We introduce a new method for learning Bayesian neural networks, treating them as a stack of multivariate Bayesian linear regression models. The main idea is to infer the layerwise posterior exactly if we know the target outputs of each layer. We define these pseudo-targets as the layer outputs from the forward pass, updated by the backpropagated of the objective function.
arXiv Detail & Related papers (2024-11-18T22:18:34Z)
Old Optimizer, New Norm: An Anthology [3.471637998699967]
We argue that each method can instead be understood as a squarely first-order method without convexity assumptions. By generalizing this observation, we chart a new design space for training algorithms. We hope that this idea of carefully metrizing the neural architecture might lead to more stable, scalable and indeed faster training.
arXiv Detail & Related papers (2024-09-30T14:26:12Z)
Matrix Completion via Nonsmooth Regularization of Fully Connected Neural Networks [7.349727826230864]
It has been shown that enhanced performance could be attained by using nonlinear estimators such as deep neural networks. In this paper, we control over-fitting by regularizing FCNN model in terms of norm intermediate representations. Our simulations indicate the superiority of the proposed algorithm in comparison with existing linear and nonlinear algorithms.
arXiv Detail & Related papers (2024-03-15T12:00:37Z)
Spectral Norm of Convolutional Layers with Circular and Zero Paddings [55.233197272316275]
We generalize the use of the Gram iteration to zero padding convolutional layers and prove its quadratic convergence. We also provide theorems for bridging the gap between circular and zero padding convolution's spectral norm.
arXiv Detail & Related papers (2024-01-31T23:48:48Z)
Robust Training and Verification of Implicit Neural Networks: A Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks. We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network. We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z)
A Mini-Block Natural Gradient Method for Deep Neural Networks [12.48022619079224]
We propose and analyze the convergence of an approximate natural gradient method, mini-block Fisher (MBF) Our novel approach utilizes the parallelism of generalization to efficiently perform on the large number of matrices in each layer.
arXiv Detail & Related papers (2022-02-08T20:01:48Z)
Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies. We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z)
Tensor-based framework for training flexible neural networks [9.176056742068813]
We propose a new learning algorithm which solves a constrained coupled matrix-tensor factorization (CMTF) problem. The proposed algorithm can handle different bases decomposition. The goal of this method is to compress large pretrained NN models, by replacing tensorworks, em i.e., one or multiple layers of the original network, by a new flexible layer.
arXiv Detail & Related papers (2021-06-25T10:26:48Z)
Preprint: Norm Loss: An efficient yet effective regularization method for deep neural networks [7.214681039134488]
We propose a weight soft-regularization method based on the oblique manifold. We evaluate our method on the popular CIFAR-10, CIFAR-100 and ImageNet 2012 datasets.
arXiv Detail & Related papers (2021-03-11T10:24:49Z)
Tensor-Train Networks for Learning Predictive Modeling of Multidimensional Data [0.0]
A promising strategy is based on tensor networks, which have been very successful in physical and chemical applications. We show that the weights of a multidimensional regression model can be learned by means of tensor networks with the aim of performing a powerful compact representation. An algorithm based on alternating least squares has been proposed for approximating the weights in TT-format with a reduction of computational power.
arXiv Detail & Related papers (2021-01-22T16:14:38Z)
Connecting Weighted Automata, Tensor Networks and Recurrent Neural Networks through Spectral Learning [58.14930566993063]
We present connections between three models used in different research fields: weighted finite automata(WFA) from formal languages and linguistics, recurrent neural networks used in machine learning, and tensor networks. We introduce the first provable learning algorithm for linear 2-RNN defined over sequences of continuous vectors input.
arXiv Detail & Related papers (2020-10-19T15:28:00Z)
Pooling Methods in Deep Neural Networks, a Review [6.1678491628787455]
pooling layer is an important layer that executes the down-sampling on the feature maps coming from the previous layer. In this paper, we reviewed some of the famous and useful pooling methods.
arXiv Detail & Related papers (2020-09-16T06:11:40Z)
Neural Subdivision [58.97214948753937]
This paper introduces Neural Subdivision, a novel framework for data-driven coarseto-fine geometry modeling. We optimize for the same set of network weights across all local mesh patches, thus providing an architecture that is not constrained to a specific input mesh, fixed genus, or category. We demonstrate that even when trained on a single high-resolution mesh our method generates reasonable subdivisions for novel shapes.
arXiv Detail & Related papers (2020-05-04T20:03:21Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
Evolving Normalization-Activation Layers [100.82879448303805]
We develop efficient rejection protocols to quickly filter out candidate layers that do not work well. Our method leads to the discovery of EvoNorms, a set of new normalization-activation layers with novel, and sometimes surprising structures. Our experiments show that EvoNorms work well on image classification models including ResNets, MobileNets and EfficientNets.
arXiv Detail & Related papers (2020-04-06T19:52:48Z)
Controllable Orthogonalization in Training DNNs [96.1365404059924]
Orthogonality is widely used for training deep neural networks (DNNs) due to its ability to maintain all singular values of the Jacobian close to 1. This paper proposes a computationally efficient and numerically stable orthogonalization method using Newton's iteration (ONI) We show that our method improves the performance of image classification networks by effectively controlling the orthogonality to provide an optimal tradeoff between optimization benefits and representational capacity reduction. We also show that ONI stabilizes the training of generative adversarial networks (GANs) by maintaining the Lipschitz continuity of a network, similar to spectral normalization (
arXiv Detail & Related papers (2020-04-02T10:14:27Z)
Backward Feature Correction: How Deep Learning Performs Deep (Hierarchical) Learning [66.05472746340142]
This paper analyzes how multi-layer neural networks can perform hierarchical learning _efficiently_ and _automatically_ by SGD on the training objective. We establish a new principle called "backward feature correction", where the errors in the lower-level features can be automatically corrected when training together with the higher-level layers.
arXiv Detail & Related papers (2020-01-13T17:28:29Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.