Related papers: Pruning at Initialisation through the lens of Graphon Limit: Convergence, Expressivity, and Generalisation

Pruning at Initialisation through the lens of Graphon Limit: Convergence, Expressivity, and Generalisation

URL: http://arxiv.org/abs/2602.06675v1
Date: Fri, 06 Feb 2026 13:02:47 GMT
Title: Pruning at Initialisation through the lens of Graphon Limit: Convergence, Expressivity, and Generalisation
Authors: Hoang Pham, The-Anh Ta, Long Tran-Thanh,
Abstract summary: Pruning at Initialisation methods discover sparse, trainableworks before training, but their theoretical mechanisms remain elusive.<n>In this work, we connect discrete intrinsic prunings to graph limit theory via graphons, establishing the graphon limit of PaI masks.<n>We derive two fundamental theoretical results: (i) a Universal Approximation Theorem for sparse networks that depends only on the dimension of active coordinate subspaces; and (ii) a Graphon-NTK generalisation bound demonstrating how the limit graphon geometry to align with informative features.
Score: 16.628325681877556
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pruning at Initialisation methods discover sparse, trainable subnetworks before training, but their theoretical mechanisms remain elusive. Existing analyses are often limited to finite-width statistics, lacking a rigorous characterisation of the global sparsity patterns that emerge as networks grow large. In this work, we connect discrete pruning heuristics to graph limit theory via graphons, establishing the graphon limit of PaI masks. We introduce a Factorised Saliency Model that encompasses popular pruning criteria and prove that, under regularity conditions, the discrete masks generated by these algorithms converge to deterministic bipartite graphons. This limit framework establishes a novel topological taxonomy for sparse networks: while unstructured methods (e.g., Random, Magnitude) converge to homogeneous graphons representing uniform connectivity, data-driven methods (e.g., SNIP, GraSP) converge to heterogeneous graphons that encode implicit feature selection. Leveraging this continuous characterisation, we derive two fundamental theoretical results: (i) a Universal Approximation Theorem for sparse networks that depends only on the intrinsic dimension of active coordinate subspaces; and (ii) a Graphon-NTK generalisation bound demonstrating how the limit graphon modulates the kernel geometry to align with informative features. Our results transform the study of sparse neural networks from combinatorial graph problems into a rigorous framework of continuous operators, offering a new mechanism for analysing expressivity and generalisation in sparse neural networks.

Related papers

Manifold limit for the training of shallow graph convolutional neural networks [1.2744523252873352]
We study the consistency of the training of shallow graph convolutional neural networks (GCNNs) on proximity graphs of sampled point clouds.<n>We prove $$-convergence of regularized empirical risk minimization functionals and corresponding convergence of their global minimizers.
arXiv Detail & Related papers (2026-01-09T18:59:20Z)
The Graphon Limit Hypothesis: Understanding Neural Network Pruning via Infinite Width Analysis [36.176112635780406]
We study the training dynamics of sparse networks in the infinite width limit.<n>Our framework provides theoretical insights into the impact of connectivity patterns on the trainability of various sparse network architectures.
arXiv Detail & Related papers (2025-10-20T13:13:35Z)
On the Convergence and Size Transferability of Continuous-depth Graph Neural Networks [3.3390650922528184]
Continuous-depth graph neural networks (GNNs) combine the structural inductive bias of Graph Neural Networks (GNNs) with the continuous-depth architecture of Graph Neural Differential Equations (ODEs)<n>We present a rigorous convergence analysis of NeuralEs with time parameters in the infinite-node limit, providing theoretical insights into their size transferability.
arXiv Detail & Related papers (2025-10-04T19:59:21Z)
Genus expansion for non-linear random matrix ensembles with applications to neural networks [3.801509221714223]
We present a unified approach to studying certain non-linear random matrix ensembles and associated random neural networks.<n>We use a novel series expansion for neural networks which generalizes Fa'a di Bruno's formula to an arbitrary number of compositions.<n>As an application, we prove several results about neural networks with random weights.
arXiv Detail & Related papers (2024-07-11T12:58:07Z)
What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding [67.59552859593985]
Graph Transformers, which incorporate self-attention and positional encoding, have emerged as a powerful architecture for various graph learning tasks. This paper introduces first theoretical investigation of a shallow Graph Transformer for semi-supervised classification.
arXiv Detail & Related papers (2024-06-04T05:30:16Z)
Revealing Decurve Flows for Generalized Graph Propagation [108.80758541147418]
This study addresses the limitations of the traditional analysis of message-passing, central to graph learning, by defining em textbfgeneralized propagation with directed and weighted graphs. We include a preliminary exploration of learned propagation patterns in datasets, a first in the field.
arXiv Detail & Related papers (2024-02-13T14:13:17Z)
Supercharging Graph Transformers with Advective Diffusion [28.40109111316014]
This paper proposes Advective Diffusion Transformer (AdvDIFFormer), a physics-inspired graph Transformer model designed to address this challenge.<n>We show that AdvDIFFormer has provable capability for controlling generalization error with topological shifts.<n> Empirically, the model demonstrates superiority in various predictive tasks across information networks, molecular screening and protein interactions.
arXiv Detail & Related papers (2023-10-10T08:40:47Z)
Semi-Supervised Clustering of Sparse Graphs: Crossing the Information-Theoretic Threshold [3.6052935394000234]
Block model is a canonical random graph model for clustering and community detection on network-structured data. No estimator based on the network topology can perform substantially better than chance on sparse graphs if the model parameter is below a certain threshold. We prove that with an arbitrary fraction of the labels feasible throughout the parameter domain.
arXiv Detail & Related papers (2022-05-24T00:03:25Z)
Dist2Cycle: A Simplicial Neural Network for Homology Localization [66.15805004725809]
Simplicial complexes can be viewed as high dimensional generalizations of graphs that explicitly encode multi-way ordered relations. We propose a graph convolutional model for learning functions parametrized by the $k$-homological features of simplicial complexes.
arXiv Detail & Related papers (2021-10-28T14:59:41Z)
Joint Network Topology Inference via Structured Fusion Regularization [70.30364652829164]
Joint network topology inference represents a canonical problem of learning multiple graph Laplacian matrices from heterogeneous graph signals. We propose a general graph estimator based on a novel structured fusion regularization. We show that the proposed graph estimator enjoys both high computational efficiency and rigorous theoretical guarantee.
arXiv Detail & Related papers (2021-03-05T04:42:32Z)
Tractably Modelling Dependence in Networks Beyond Exchangeability [0.0]
We study the estimation, clustering and degree behavior of the network in our setting. This explores why and under which general conditions non-exchangeable network data can be described by a block model.
arXiv Detail & Related papers (2020-07-28T17:13:59Z)
Multipole Graph Neural Operator for Parametric Partial Differential Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data. We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity. Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)
Block-Approximated Exponential Random Graphs [77.4792558024487]
An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs. We propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions. Our methods are scalable to sparse graphs consisting of millions of nodes.
arXiv Detail & Related papers (2020-02-14T11:42:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.