Related papers: A Combinatorial Theory of Dropout: Subnetworks, Graph Geometry, and Generalization

A Combinatorial Theory of Dropout: Subnetworks, Graph Geometry, and Generalization

URL: http://arxiv.org/abs/2504.14762v1
Date: Sun, 20 Apr 2025 23:09:20 GMT
Title: A Combinatorial Theory of Dropout: Subnetworks, Graph Geometry, and Generalization
Authors: Sahil Rajesh Dhayalkar,
Abstract summary: We propose a generalization and graph-theoretic theory of dropout by modeling a random walk over a high-dimensional graph of binaryworks.<n>We prove that generalizingworks form large, connected, low-resistance clusters, and that their number grows exponentially with network width.<n>This reveals dropout as a mechanism for sampling from a robust, structured ensemble of well-generalizingworks with built-in redundancy.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a combinatorial and graph-theoretic theory of dropout by modeling training as a random walk over a high-dimensional graph of binary subnetworks. Each node represents a masked version of the network, and dropout induces stochastic traversal across this space. We define a subnetwork contribution score that quantifies generalization and show that it varies smoothly over the graph. Using tools from spectral graph theory, PAC-Bayes analysis, and combinatorics, we prove that generalizing subnetworks form large, connected, low-resistance clusters, and that their number grows exponentially with network width. This reveals dropout as a mechanism for sampling from a robust, structured ensemble of well-generalizing subnetworks with built-in redundancy. Extensive experiments validate every theoretical claim across diverse architectures. Together, our results offer a unified foundation for understanding dropout and suggest new directions for mask-guided regularization and subnetwork optimization.

Related papers

What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding [67.59552859593985]
Graph Transformers, which incorporate self-attention and positional encoding, have emerged as a powerful architecture for various graph learning tasks. This paper introduces first theoretical investigation of a shallow Graph Transformer for semi-supervised classification.
arXiv Detail & Related papers (2024-06-04T05:30:16Z)
T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining. Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z)
Fitting Low-rank Models on Egocentrically Sampled Partial Networks [4.111899441919165]
We propose an approach to fit general low-rank models for egocentrically sampled networks. This method offers the first theoretical guarantee for egocentric partial network estimation. We evaluate the technique on several synthetic and real-world networks and show that it delivers competitive performance in link prediction tasks.
arXiv Detail & Related papers (2023-03-09T03:20:44Z)
Learning Coherent Clusters in Weakly-Connected Network Systems [7.766921168069532]
We propose a structure-preserving model methodology for large-scale dynamic networks with tightly-connected components. We provide an upper bound on the approximation error when the network graph is randomly generated from a weight block model.
arXiv Detail & Related papers (2022-11-28T13:32:25Z)
Graph Auto-Encoders for Network Completion [6.1074304332419675]
We propose a model to use the learned pattern of connections from the observed part of the network to complete the whole graph. Our proposed model achieved competitive performance with less information needed.
arXiv Detail & Related papers (2022-04-25T05:24:45Z)
The Principles of Deep Learning Theory [19.33681537640272]
This book develops an effective theory approach to understanding deep neural networks of practical relevance. We explain how these effectively-deep networks learn nontrivial representations from training. We show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks.
arXiv Detail & Related papers (2021-06-18T15:00:00Z)
Joint Network Topology Inference via Structured Fusion Regularization [70.30364652829164]
Joint network topology inference represents a canonical problem of learning multiple graph Laplacian matrices from heterogeneous graph signals. We propose a general graph estimator based on a novel structured fusion regularization. We show that the proposed graph estimator enjoys both high computational efficiency and rigorous theoretical guarantee.
arXiv Detail & Related papers (2021-03-05T04:42:32Z)
Anomaly Detection on Attributed Networks via Contrastive Self-Supervised Learning [50.24174211654775]
We present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks. Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair. A graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure.
arXiv Detail & Related papers (2021-02-27T03:17:20Z)
Spectral Embedding of Graph Networks [76.27138343125985]
We introduce an unsupervised graph embedding that trades off local node similarity and connectivity, and global structure. The embedding is based on a generalized graph Laplacian, whose eigenvectors compactly capture both network structure and neighborhood proximity in a single representation.
arXiv Detail & Related papers (2020-09-30T04:59:10Z)
Consistency of Spectral Clustering on Hierarchical Stochastic Block Models [5.983753938303726]
We study the hierarchy of communities in real-world networks under a generic block model. We prove the strong consistency of this method under a wide range of model parameters. Unlike most of existing work, our theory covers multiscale networks where the connection probabilities may differ by orders of magnitude.
arXiv Detail & Related papers (2020-04-30T01:08:59Z)
Progressive Graph Convolutional Networks for Semi-Supervised Node Classification [97.14064057840089]
Graph convolutional networks have been successful in addressing graph-based tasks such as semi-supervised node classification. We propose a method to automatically build compact and task-specific graph convolutional networks.
arXiv Detail & Related papers (2020-03-27T08:32:16Z)
Understanding Graph Neural Networks with Generalized Geometric Scattering Transforms [67.88675386638043]
The scattering transform is a multilayered wavelet-based deep learning architecture that acts as a model of convolutional neural networks. We introduce windowed and non-windowed geometric scattering transforms for graphs based upon a very general class of asymmetric wavelets. We show that these asymmetric graph scattering transforms have many of the same theoretical guarantees as their symmetric counterparts.
arXiv Detail & Related papers (2019-11-14T17:23:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.