Related papers: A Graph Sufficiency Perspective for Neural Networks

A Graph Sufficiency Perspective for Neural Networks

URL: http://arxiv.org/abs/2507.10215v2
Date: Fri, 08 Aug 2025 12:39:33 GMT
Title: A Graph Sufficiency Perspective for Neural Networks
Authors: Cencheng Shen, Yuexiao Dong,
Abstract summary: This paper analyzes neural networks through graph variables and statistical sufficiency.<n>We interpret neural network layers as graph-based transformations, where neurons act as pairwise functions between inputs and learned anchor points.<n>Our framework covers fully connected layers, general pairwise functions, ReLU and sigmoid activations, and convolutional neural networks.
Score: 4.872570541276082
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper analyzes neural networks through graph variables and statistical sufficiency. We interpret neural network layers as graph-based transformations, where neurons act as pairwise functions between inputs and learned anchor points. Within this formulation, we establish conditions under which layer outputs are sufficient for the layer inputs, that is, each layer preserves the conditional distribution of the target variable given the input variable. We explore two theoretical paths under this graph-based view. The first path assumes dense anchor points and shows that asymptotic sufficiency holds in the infinite-width limit and is preserved throughout training. The second path, more aligned with practical architectures, proves exact or approximate sufficiency in finite-width networks by assuming region-separated input distributions and constructing appropriate anchor points. This path can ensure the sufficiency property for an infinite number of layers, and provide error bounds on the optimal loss for both regression and classification tasks using standard neural networks. Our framework covers fully connected layers, general pairwise functions, ReLU and sigmoid activations, and convolutional neural networks. Overall, this work bridges statistical sufficiency, graph-theoretic representations, and deep learning, providing a new statistical understanding of neural networks.

Related papers

An unsupervised tour through the hidden pathways of deep neural networks [6.063903439185316]
This thesis focuses on characterizing semantic content of hidden representations with unsupervised learning tools.<n>In Chapter 3, we study the evolution of the probability density across the hidden layers in some state-of-the-art deep neural networks.<n>In Chapter 4, we study the problem of generalization in deep neural networks.
arXiv Detail & Related papers (2025-10-24T15:50:31Z)
Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework.<n>We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values.<n>This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z)
Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks. We show that the networks acquire strong, data-dependent features. Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z)
Half-Space Feature Learning in Neural Networks [2.3249139042158853]
There currently exist two extreme viewpoints for neural network feature learning. We argue neither interpretation is likely to be correct based on a novel viewpoint. We use this alternate interpretation to motivate a model, called the Deep Linearly Gated Network (DLGN)
arXiv Detail & Related papers (2024-04-05T12:03:19Z)
Generalization Error of Graph Neural Networks in the Mean-field Regime [10.35214360391282]
We explore two widely utilized types of graph neural networks: graph convolutional neural networks and message passing graph neural networks. Our novel approach involves deriving upper bounds within the mean-field regime for evaluating the generalization error of these graph neural networks.
arXiv Detail & Related papers (2024-02-10T19:12:31Z)
Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence. We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers. This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z)
Quantifying the Optimization and Generalization Advantages of Graph Neural Networks Over Multilayer Perceptrons [50.33260238739837]
Graph networks (GNNs) have demonstrated remarkable capabilities in learning from graph-structured data.<n>There remains a lack of analysis comparing GNNs and generalizations from an optimization and generalization perspective.
arXiv Detail & Related papers (2023-06-24T10:21:11Z)
Semantic Strengthening of Neuro-Symbolic Learning [85.6195120593625]
Neuro-symbolic approaches typically resort to fuzzy approximations of a probabilistic objective. We show how to compute this efficiently for tractable circuits. We test our approach on three tasks: predicting a minimum-cost path in Warcraft, predicting a minimum-cost perfect matching, and solving Sudoku puzzles.
arXiv Detail & Related papers (2023-02-28T00:04:22Z)
Graph Neural Operators for Classification of Spatial Transcriptomics Data [1.408706290287121]
We propose a study incorporating various graph neural network approaches to validate the efficacy of applying neural operators towards prediction of brain regions in mouse brain tissue samples. We were able to achieve an F1 score of nearly 72% for the graph neural operator approach which outperformed all baseline and other graph network approaches.
arXiv Detail & Related papers (2023-02-01T18:32:06Z)
On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons. Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z)
Optimal Learning Rates of Deep Convolutional Neural Networks: Additive Ridge Functions [19.762318115851617]
We consider the mean squared error analysis for deep convolutional neural networks. We show that, for additive ridge functions, convolutional neural networks followed by one fully connected layer with ReLU activation functions can reach optimal mini-max rates.
arXiv Detail & Related papers (2022-02-24T14:22:32Z)
Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent. We show that SGD is biased towards a simple solution. We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z)
Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules. inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z)
Subspace Clustering Based Analysis of Neural Networks [7.451579925406617]
We learn affinity graphs from the latent structure of a given neural network layer trained over a set of inputs. We then use tools from Community Detection to quantify structures present in the input. We analyze the learned affinity graphs of the final convolutional layer of the network and demonstrate how an input's local neighbourhood affects its classification by the network.
arXiv Detail & Related papers (2021-07-02T22:46:40Z)
Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow. We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z)
Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis. By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner. This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
Graph Structure of Neural Networks [104.33754950606298]
We show how the graph structure of neural networks affect their predictive performance. A "sweet spot" of relational graphs leads to neural networks with significantly improved predictive performance. Top-performing neural networks have graph structure surprisingly similar to those of real biological neural networks.
arXiv Detail & Related papers (2020-07-13T17:59:31Z)
Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss [0.0]
Neural networks trained to minimize the logistic (a.k.a. cross-entropy) loss with gradient-based methods are observed to perform well in many supervised classification tasks. We analyze the training and generalization behavior of infinitely wide two-layer neural networks with homogeneous activations.
arXiv Detail & Related papers (2020-02-11T15:42:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.