Wide Neural Networks as a Baseline for the Computational No-Coincidence Conjecture
- URL: http://arxiv.org/abs/2510.06527v1
- Date: Wed, 08 Oct 2025 00:02:22 GMT
- Title: Wide Neural Networks as a Baseline for the Computational No-Coincidence Conjecture
- Authors: John Dunbar, Scott Aaronson,
- Abstract summary: We show that randomly neural networks have nearly independent outputs exactly when their activation function is nonlinear with zero mean under the Gaussian measure: $mathbbE_z sim mathcalN(0,1)[sigma(z)]=0$.<n>Because of their nearly independent outputs, we propose neural networks with zero-mean activation functions as a promising candidate for the Alignment Research Center's computational no-coincidence conjecture -- a conjecture that aims to measure the limits of AI interpretability.
- Score: 2.2201528765499416
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We establish that randomly initialized neural networks, with large width and a natural choice of hyperparameters, have nearly independent outputs exactly when their activation function is nonlinear with zero mean under the Gaussian measure: $\mathbb{E}_{z \sim \mathcal{N}(0,1)}[\sigma(z)]=0$. For example, this includes ReLU and GeLU with an additive shift, as well as tanh, but not ReLU or GeLU by themselves. Because of their nearly independent outputs, we propose neural networks with zero-mean activation functions as a promising candidate for the Alignment Research Center's computational no-coincidence conjecture -- a conjecture that aims to measure the limits of AI interpretability.
Related papers
- N-ReLU: Zero-Mean Stochastic Extension of ReLU [5.691710068675227]
N-ReLU (Noise-ReLU) is a zero-mean extension of the standard rectified linear unit (ReLU)<n>It replaces negative activations with Gaussian noise while preserving the same expected output expectation.<n>N-ReLU achieves robustness accuracy comparable to or slightly exceeding that of ReLU, LeakyReLU, PReLU, GELU, and RReLU.
arXiv Detail & Related papers (2025-11-10T19:07:10Z) - Nonlinear Dynamics In Optimization Landscape of Shallow Neural Networks with Tunable Leaky ReLU [3.2063364612188416]
We study the nonlinear dynamics of a shallow neural network trained with mean-squared loss and leaky ReLU activation.<n>We detect bifurcation of critical points with associated symmetries from global minimum as leaky parameter $alpha$ varies.
arXiv Detail & Related papers (2025-10-29T01:00:07Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Points of non-linearity of functions generated by random neural networks [0.0]
We consider functions from the real numbers to the real numbers, output by a neural network with 1 hidden activation layer, arbitrary width, and ReLU activation function.
We compute the expected distribution of the points of non-linearity.
arXiv Detail & Related papers (2023-04-19T17:40:19Z) - Spectral Bias Outside the Training Set for Deep Networks in the Kernel
Regime [0.0]
We show that the network is biased to learn the top eigenfunctions of the Neural Kernel not just on the training set but over the entire input space.
This bias depends on the model architecture and input distribution alone.
We conclude that local capacity control from the low effective rank of the Fisher Information Matrix is still underexplored theoretically.
arXiv Detail & Related papers (2022-06-06T22:09:15Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - A global convergence theory for deep ReLU implicit networks via
over-parameterization [26.19122384935622]
Implicit deep learning has received increasing attention recently.
This paper analyzes the gradient flow of Rectified Linear Unit (ReLU) activated implicit neural networks.
arXiv Detail & Related papers (2021-10-11T23:22:50Z) - Deformed semicircle law and concentration of nonlinear random matrices
for ultra-wide neural networks [29.03095282348978]
We study the limiting spectral distributions of two empirical kernel matrices associated with $f(X)$.
We show that random feature regression induced by the empirical kernel achieves the same performance as its limiting kernel regression under the ultra-wide regime.
arXiv Detail & Related papers (2021-09-20T05:25:52Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Large-width functional asymptotics for deep Gaussian neural networks [2.7561479348365734]
We consider fully connected feed-forward deep neural networks where weights and biases are independent and identically distributed according to Gaussian distributions.
Our results contribute to recent theoretical studies on the interplay between infinitely wide deep neural networks and processes.
arXiv Detail & Related papers (2021-02-20T10:14:37Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z) - Gaussian Error Linear Units (GELUs) [58.195342948092964]
We propose a neural network activation function that weights inputs by their value, rather than gates by their sign.
We find performance improvements across all considered computer vision, natural language processing, and speech tasks.
arXiv Detail & Related papers (2016-06-27T19:20:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.