Related papers: Initialization and training of matrix product state probabilistic models

Initialization and training of matrix product state probabilistic models

URL: http://arxiv.org/abs/2505.06419v1
Date: Fri, 09 May 2025 20:39:25 GMT
Title: Initialization and training of matrix product state probabilistic models
Authors: Xun Tang, Yuehaw Khoo, Lexing Ying,
Abstract summary: We investigate a common failure mode in training randomly matrix product states using gradient descent.<n>The trained MPS models do not accurately predict the strong interactions between boundary sites.<n>We propose two complementary strategies to overcome the training failure.
Score: 10.391338066539237
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modeling probability distributions via the wave function of a quantum state is central to quantum-inspired generative modeling and quantum state tomography (QST). We investigate a common failure mode in training randomly initialized matrix product states (MPS) using gradient descent. The results show that the trained MPS models do not accurately predict the strong interactions between boundary sites in periodic spin chain models. In the case of the Born machine algorithm, we further identify a causality trap, where the trained MPS models resemble causal models that ignore the non-local correlations in the true distribution. We propose two complementary strategies to overcome the training failure -- one through optimization and one through initialization. First, we develop a natural gradient descent (NGD) method, which approximately simulates the gradient flow on tensor manifolds and significantly enhances training efficiency. Numerical experiments show that NGD avoids local minima in both Born machines and in general MPS tomography. Remarkably, we show that NGD with line search can converge to the global minimum in only a few iterations. Second, for the BM algorithm, we introduce a warm-start initialization based on the TTNS-Sketch algorithm. We show that gradient descent under a warm initialization does not encounter the causality trap and admits rapid convergence to the ground truth.

Related papers

Universality and kernel-adaptive training for classically trained, quantum-deployed generative models [7.192684088403013]
The instantaneous quantum (IQP) quantum circuit Born machine (QCBM) has been proposed as a promising quantum generative model over bitstrings.<n>Recent works have shown that the training of IQP-QCBM is classically tractable w.r.t. the so-called Gaussian kernel maximum mean discrepancy (MMD) loss function.<n>We show that in the kernel-adaptive method, the convergence of the MMD value implies weak convergence in distribution of the generator.
arXiv Detail & Related papers (2025-10-09T17:17:34Z)
Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling [70.8832906871441]
We study how to steer generation toward desired rewards without retraining the models.<n>Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement.<n>We introduce particle Gibbs sampling for diffusion language models (PG-DLM), a novel inference-time algorithm enabling trajectory-level refinement while preserving generation perplexity.
arXiv Detail & Related papers (2025-07-11T08:00:47Z)
Two failure modes of deep transformers and how to avoid them: a unified theory of signal propagation at initialisation [8.973965016201822]
Finding the right initialisation for neural networks is crucial to ensure smooth training and good performance.<n>In transformers, the wrong initialisation can lead to one of two failure modes of self-attention layers: rank collapse, where all tokens collapse into similar representations, and entropy collapse, where highly concentrated attention scores lead to instability.<n>Here, we provide an analytical theory of signal propagation through deep transformers with self-attention, layer normalisation, skip connections and gradients.
arXiv Detail & Related papers (2025-05-30T08:18:23Z)
MILP initialization for solving parabolic PDEs with PINNs [2.5932373010465364]
Physics-Informed Neural Networks (PINNs) are a powerful deep learning method capable of providing solutions and parameter estimations of physical systems.<n>Given the complexity of their neural network structure, the convergence speed is still limited compared to numerical methods.
arXiv Detail & Related papers (2025-01-27T15:46:38Z)
Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics. Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes. We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z)
Diffusion-Model-Assisted Supervised Learning of Generative Models for Density Estimation [10.793646707711442]
We present a framework for training generative models for density estimation. We use the score-based diffusion model to generate labeled data. Once the labeled data are generated, we can train a simple fully connected neural network to learn the generative model in the supervised manner.
arXiv Detail & Related papers (2023-10-22T23:56:19Z)
Max-affine regression via first-order methods [7.12511675782289]
The max-affine model ubiquitously arises in applications in signal processing and statistics. We present a non-asymptotic convergence analysis of gradient descent (GD) and mini-batch gradient descent (SGD) for max-affine regression.
arXiv Detail & Related papers (2023-08-15T23:46:44Z)
Predicting the Initial Conditions of the Universe using a Deterministic Neural Network [10.158552381785078]
Finding the initial conditions that led to the current state of the universe is challenging because it involves searching over an intractable input space of initial conditions. Deep learning has emerged as a surrogate for N-body simulations by directly learning the mapping between the linear input of an N-body simulation and the final nonlinear output from the simulation. In this work, we pioneer the use of a deterministic convolutional neural network for learning the reverse mapping and show that it accurately recovers the initial linear displacement field over a wide range of scales.
arXiv Detail & Related papers (2023-03-23T06:04:36Z)
CoopInit: Initializing Generative Adversarial Networks via Cooperative Learning [50.90384817689249]
CoopInit is a cooperative learning-based strategy that can quickly learn a good starting point for GANs. We demonstrate the effectiveness of the proposed approach on image generation and one-sided unpaired image-to-image translation tasks.
arXiv Detail & Related papers (2023-03-21T07:49:32Z)
Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems. PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features. In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z)
Losing momentum in continuous-time stochastic optimisation [42.617042045455506]
momentum-based optimisation algorithms have become particularly widespread. In this work, we analyse a continuous-time model for gradient descent with momentum. We also train a convolutional neural network in an image classification problem.
arXiv Detail & Related papers (2022-09-08T10:46:05Z)
Simple lessons from complex learning: what a neural network model learns about cosmic structure formation [7.270598539996841]
We train a neural network model to predict the full phase space evolution of cosmological N-body simulations. Our model achieves percent level accuracy at nonlinear scales of $ksim 1 mathrmMpc-1, h$, representing a significant improvement over COLA.
arXiv Detail & Related papers (2022-06-09T15:41:09Z)
Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference. It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z)
Distributional Gradient Matching for Learning Uncertain Neural Dynamics Models [38.17499046781131]
We propose a novel approach towards estimating uncertain neural ODEs, avoiding the numerical integration bottleneck. Our algorithm - distributional gradient matching (DGM) - jointly trains a smoother and a dynamics model and matches their gradients via minimizing a Wasserstein loss. Our experiments show that, compared to traditional approximate inference methods based on numerical integration, our approach is faster to train, faster at predicting previously unseen trajectories, and in the context of neural ODEs, significantly more accurate.
arXiv Detail & Related papers (2021-06-22T08:40:51Z)
Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms. We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models [93.24030378630175]
We propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs) We derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation. Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities.
arXiv Detail & Related papers (2019-10-31T02:26:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.