Related papers: Interpretable Scaling Behavior in Sparse Subnetwork Representations of Quantum States

Interpretable Scaling Behavior in Sparse Subnetwork Representations of Quantum States

URL: http://arxiv.org/abs/2505.22734v1
Date: Wed, 28 May 2025 18:00:08 GMT
Title: Interpretable Scaling Behavior in Sparse Subnetwork Representations of Quantum States
Authors: Brandon Barton, Juan Carrasquilla, Christopher Roth, Agnes Valenti,
Abstract summary: We show that sparse neural networks can reach accuracies comparable to their dense counterparts, even when pruned by more than an order of magnitude in parameter count.<n>We identify universal scaling behavior that persists across network sizes and physical models, where the boundaries of scaling regions are determined by the underlying Hamiltonian.
Score: 0.46603287532620735
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Lottery Ticket Hypothesis (LTH) posits that within overparametrized neural networks, there exist sparse subnetworks that are capable of matching the performance of the original model when trained in isolation from the original initialization. We extend this hypothesis to the unsupervised task of approximating the ground state of quantum many-body Hamiltonians, a problem equivalent to finding a neural-network compression of the lowest-lying eigenvector of an exponentially large matrix. Focusing on two representative quantum Hamiltonians, the transverse field Ising model (TFIM) and the toric code (TC), we demonstrate that sparse neural networks can reach accuracies comparable to their dense counterparts, even when pruned by more than an order of magnitude in parameter count. Crucially, and unlike the original LTH, we find that performance depends only on the structure of the sparse subnetwork, not on the specific initialization, when trained in isolation. Moreover, we identify universal scaling behavior that persists across network sizes and physical models, where the boundaries of scaling regions are determined by the underlying Hamiltonian. At the onset of high-error scaling, we observe signatures of a sparsity-induced quantum phase transition that is first-order in shallow networks. Finally, we demonstrate that pruning enhances interpretability by linking the structure of sparse subnetworks to the underlying physics of the Hamiltonian.

Related papers

Quantization vs Pruning: Insights from the Strong Lottery Ticket Hypothesis [5.494111035517599]
Quantization is an essential technique for making neural networks more efficient, yet our theoretical understanding of it remains limited.<n>Previous works demonstrated that extremely low-precision networks, such as binary networks, can be constructed by pruning large, randomly- approximationd networks.<n>We build on foundational results by Borgs et al. on the Number Partitioning Problem to derive new theoretical results for the Random Subset Sum Problem in a quantized setting.
arXiv Detail & Related papers (2025-08-14T18:51:34Z)
SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network Quantization [5.982922468400901]
We show that it is possible to achieve error bounds equivalent to that obtained in the order of the weights of a neural layer. We prove that it is possible to achieve full-network bounds under an infinite alphabet and minimal assumptions on the input data.
arXiv Detail & Related papers (2023-09-20T00:35:16Z)
Spike-and-slab shrinkage priors for structurally sparse Bayesian neural networks [0.16385815610837165]
Sparse deep learning addresses challenges by recovering a sparse representation of the underlying target function. Deep neural architectures compressed via structured sparsity provide low latency inference, higher data throughput, and reduced energy consumption. We propose structurally sparse Bayesian neural networks which prune excessive nodes with (i) Spike-and-Slab Group Lasso (SS-GL), and (ii) Spike-and-Slab Group Horseshoe (SS-GHS) priors.
arXiv Detail & Related papers (2023-08-17T17:14:18Z)
How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series. We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons. Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z)
On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK) In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z)
The Sample Complexity of One-Hidden-Layer Neural Networks [57.6421258363243]
We study a class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm. We prove that controlling the spectral norm of the hidden layer weight matrix is insufficient to get uniform convergence guarantees. We analyze two important settings where a mere spectral norm control turns out to be sufficient.
arXiv Detail & Related papers (2022-02-13T07:12:02Z)
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z)
The edge of chaos: quantum field theory and deep neural networks [0.0]
We explicitly construct the quantum field theory corresponding to a general class of deep neural networks. We compute the loop corrections to the correlation function in a perturbative expansion in the ratio of depth $T$ to width $N$. Our analysis provides a first-principles approach to the rapidly emerging NN-QFT correspondence, and opens several interesting avenues to the study of criticality in deep neural networks.
arXiv Detail & Related papers (2021-09-27T18:00:00Z)
Mitigating Performance Saturation in Neural Marked Point Processes: Architectures and Loss Functions [50.674773358075015]
We propose a simple graph-based network structure called GCHP, which utilizes only graph convolutional layers. We show that GCHP can significantly reduce training time and the likelihood ratio loss with interarrival time probability assumptions can greatly improve the model performance.
arXiv Detail & Related papers (2021-07-07T16:59:14Z)
Fixed points of nonnegative neural networks [6.7113569772720565]
We first show that nonnegative neural networks with nonnegative weights and biases can be recognized as monotonic and (weakly) scalable mappings. We prove that the shape of the fixed point set of nonnegative neural networks with nonnegative weights and biases is an interval, which under mild conditions degenerates to a point.
arXiv Detail & Related papers (2021-06-30T17:49:55Z)
Quantum-inspired event reconstruction with Tensor Networks: Matrix Product States [0.0]
We show that Networks are ideal vehicles to connect quantum mechanical concepts to machine learning techniques. We show that entanglement entropy can be used to interpret what a network learns.
arXiv Detail & Related papers (2021-06-15T18:00:02Z)
Tensor-Train Networks for Learning Predictive Modeling of Multidimensional Data [0.0]
A promising strategy is based on tensor networks, which have been very successful in physical and chemical applications. We show that the weights of a multidimensional regression model can be learned by means of tensor networks with the aim of performing a powerful compact representation. An algorithm based on alternating least squares has been proposed for approximating the weights in TT-format with a reduction of computational power.
arXiv Detail & Related papers (2021-01-22T16:14:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.