The Presence and Absence of Barren Plateaus in Tensor-network Based
Machine Learning
- URL: http://arxiv.org/abs/2108.08312v1
- Date: Wed, 18 Aug 2021 18:00:03 GMT
- Title: The Presence and Absence of Barren Plateaus in Tensor-network Based
Machine Learning
- Authors: Zidu Liu, Li-Wei Yu, L.-M. Duan, and Dong-Ling Deng
- Abstract summary: We study the trainability of tensor-network based machine learning models by exploring the landscapes of different loss functions.
We rigorously prove that barren plateaus prevail in the training process of the machine learning algorithms with global loss functions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tensor networks are efficient representations of high-dimensional tensors
with widespread applications in quantum many-body physics. Recently, they have
been adapted to the field of machine learning, giving rise to an emergent
research frontier that has attracted considerable attention. Here, we study the
trainability of tensor-network based machine learning models by exploring the
landscapes of different loss functions, with a focus on the matrix product
states (also called tensor trains) architecture. In particular, we rigorously
prove that barren plateaus (i.e., exponentially vanishing gradients) prevail in
the training process of the machine learning algorithms with global loss
functions. Whereas, for local loss functions the gradients with respect to
variational parameters near the local observables do not vanish as the system
size increases. Therefore, the barren plateaus are absent in this case and the
corresponding models could be efficiently trainable. Our results reveal a
crucial aspect of tensor-network based machine learning in a rigorous fashion,
which provide a valuable guide for both practical applications and theoretical
studies in the future.
Related papers
- Mechanistic Neural Networks for Scientific Machine Learning [58.99592521721158]
We present Mechanistic Neural Networks, a neural network design for machine learning applications in the sciences.
It incorporates a new Mechanistic Block in standard architectures to explicitly learn governing differential equations as representations.
Central to our approach is a novel Relaxed Linear Programming solver (NeuRLP) inspired by a technique that reduces solving linear ODEs to solving linear programs.
arXiv Detail & Related papers (2024-02-20T15:23:24Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Spatial-wise Dynamic Distillation for MLP-like Efficient Visual Fault
Detection of Freight Trains [11.13191969085042]
We present a dynamic distillation framework based on multi-layer perceptron (MLP) for fault detection of freight trains.
We propose a dynamic teacher that can effectively eliminate the semantic discrepancy with the student model.
Our approach outperforms the current state-of-the-art detectors and achieves the highest accuracy with real-time detection at a lower computational cost.
arXiv Detail & Related papers (2023-12-10T09:18:24Z) - Theory on variational high-dimensional tensor networks [2.0307382542339485]
We investigate the emergent statistical properties of random high-dimensional-network states and the trainability of tensoral networks.
We prove that variational high-dimensional networks suffer from barren plateaus for global loss functions.
Our results pave a way for their future theoretical studies and practical applications.
arXiv Detail & Related papers (2023-03-30T15:26:30Z) - Low-Rank Tensor Function Representation for Multi-Dimensional Data
Recovery [52.21846313876592]
Low-rank tensor function representation (LRTFR) can continuously represent data beyond meshgrid with infinite resolution.
We develop two fundamental concepts for tensor functions, i.e., the tensor function rank and low-rank tensor function factorization.
Our method substantiates the superiority and versatility of our method as compared with state-of-the-art methods.
arXiv Detail & Related papers (2022-12-01T04:00:38Z) - Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points.
The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains.
We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Tensor Methods in Computer Vision and Deep Learning [120.3881619902096]
tensors, or multidimensional arrays, are data structures that can naturally represent visual data of multiple dimensions.
With the advent of the deep learning paradigm shift in computer vision, tensors have become even more fundamental.
This article provides an in-depth and practical review of tensors and tensor methods in the context of representation learning and deep learning.
arXiv Detail & Related papers (2021-07-07T18:42:45Z) - Tensor-Train Networks for Learning Predictive Modeling of
Multidimensional Data [0.0]
A promising strategy is based on tensor networks, which have been very successful in physical and chemical applications.
We show that the weights of a multidimensional regression model can be learned by means of tensor networks with the aim of performing a powerful compact representation.
An algorithm based on alternating least squares has been proposed for approximating the weights in TT-format with a reduction of computational power.
arXiv Detail & Related papers (2021-01-22T16:14:38Z) - Anomaly Detection with Tensor Networks [2.3895981099137535]
We exploit the memory and computational efficiency of tensor networks to learn a linear transformation over a space with a dimension exponential in the number of original features.
We produce competitive results on image datasets, despite not exploiting the locality of images.
arXiv Detail & Related papers (2020-06-03T20:41:30Z) - Supervised Learning in the Presence of Concept Drift: A modelling
framework [5.22609266390809]
We present a modelling framework for the investigation of supervised learning in non-stationary environments.
We model two example types of learning systems: prototype-based Learning Vector Quantization (LVQ) for classification and shallow, layered neural networks for regression tasks.
arXiv Detail & Related papers (2020-05-21T09:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.