Improvements to Gradient Descent Methods for Quantum Tensor Network
Machine Learning
- URL: http://arxiv.org/abs/2203.03366v1
- Date: Thu, 3 Mar 2022 19:00:40 GMT
- Title: Improvements to Gradient Descent Methods for Quantum Tensor Network
Machine Learning
- Authors: Fergus Barratt, James Dborin, Lewis Wright
- Abstract summary: We introduce a copy node' method that successfully initializes arbitrary tensor networks.
We present numerical results that show that the combination of techniques presented here produces quantum inspired tensor network models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tensor networks have demonstrated significant value for machine learning in a
myriad of different applications. However, optimizing tensor networks using
standard gradient descent has proven to be difficult in practice. Tensor
networks suffer from initialization problems resulting in exploding or
vanishing gradients and require extensive hyperparameter tuning. Efforts to
overcome these problems usually depend on specific network architectures, or ad
hoc prescriptions. In this paper we address the problems of initialization and
hyperparameter tuning, making it possible to train tensor networks using
established machine learning techniques. We introduce a `copy node' method that
successfully initializes arbitrary tensor networks, in addition to a gradient
based regularization technique for bond dimensions. We present numerical
results that show that the combination of techniques presented here produces
quantum inspired tensor network models with far fewer parameters, while
improving generalization performance.
Related papers
- Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis [5.016205338484259]
The proposed method is more robust to network size variations than the existing method.
When applied to Physics-Informed Neural Networks, the method exhibits faster convergence and robustness to variations of the network size.
arXiv Detail & Related papers (2024-10-03T06:30:27Z) - Quick design of feasible tensor networks for constrained combinatorial optimization [1.8775413720750924]
In recent years, tensor networks have been applied to constrained optimization problems for practical applications.
One approach is to construct tensor networks with nilpotent-matrix manipulation.
The proposed method is expected to facilitate the discovery of feasible tensor networks for constrained optimization problems.
arXiv Detail & Related papers (2024-09-03T08:36:23Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Simple initialization and parametrization of sinusoidal networks via
their kernel bandwidth [92.25666446274188]
sinusoidal neural networks with activations have been proposed as an alternative to networks with traditional activation functions.
We first propose a simplified version of such sinusoidal neural networks, which allows both for easier practical implementation and simpler theoretical analysis.
We then analyze the behavior of these networks from the neural tangent kernel perspective and demonstrate that their kernel approximates a low-pass filter with an adjustable bandwidth.
arXiv Detail & Related papers (2022-11-26T07:41:48Z) - Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models.
We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers.
A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z) - Training Thinner and Deeper Neural Networks: Jumpstart Regularization [2.8348950186890467]
We use regularization to prevent neurons from dying or becoming linear.
In comparison to conventional training, we obtain neural networks that are thinner, deeper, and - most importantly - more parameter-efficient.
arXiv Detail & Related papers (2022-01-30T12:11:24Z) - Tensor-Train Networks for Learning Predictive Modeling of
Multidimensional Data [0.0]
A promising strategy is based on tensor networks, which have been very successful in physical and chemical applications.
We show that the weights of a multidimensional regression model can be learned by means of tensor networks with the aim of performing a powerful compact representation.
An algorithm based on alternating least squares has been proposed for approximating the weights in TT-format with a reduction of computational power.
arXiv Detail & Related papers (2021-01-22T16:14:38Z) - Anomaly Detection with Tensor Networks [2.3895981099137535]
We exploit the memory and computational efficiency of tensor networks to learn a linear transformation over a space with a dimension exponential in the number of original features.
We produce competitive results on image datasets, despite not exploiting the locality of images.
arXiv Detail & Related papers (2020-06-03T20:41:30Z) - Beyond Dropout: Feature Map Distortion to Regularize Deep Neural
Networks [107.77595511218429]
In this paper, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks.
We propose a feature distortion method (Disout) for addressing the aforementioned problem.
The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated.
arXiv Detail & Related papers (2020-02-23T13:59:13Z) - Molecule Property Prediction and Classification with Graph Hypernetworks [113.38181979662288]
We show that the replacement of the underlying networks with hypernetworks leads to a boost in performance.
A major difficulty in the application of hypernetworks is their lack of stability.
A recent work has tackled the training instability of hypernetworks in the context of error correcting codes.
arXiv Detail & Related papers (2020-02-01T16:44:34Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.