Estimating the Probability of Sampling a Trained Neural Network at Random
- URL: http://arxiv.org/abs/2501.18812v2
- Date: Tue, 08 Apr 2025 00:36:01 GMT
- Title: Estimating the Probability of Sampling a Trained Neural Network at Random
- Authors: Adam Scherlis, Nora Belrose,
- Abstract summary: We analyze an algorithm for estimating the size of a localized neighborhood in a neural network.<n>We show that this lower bound can be improved with an importance-sampling method.<n>We find that overfit, badly-generalizing neighborhoods are smaller, indicating a more complex learned behavior.
- Score: 1.9358739203360091
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present and analyze an algorithm for estimating the size, under a Gaussian or uniform measure, of a localized neighborhood in neural network parameter space with behavior similar to an ``anchor'' point. We refer to this as the "local volume" of the anchor. We adapt an existing basin-volume estimator, which is very fast but in many cases only provides a lower bound. We show that this lower bound can be improved with an importance-sampling method using gradient information that is already provided by popular optimizers. The negative logarithm of local volume can also be interpreted as a measure of the anchor network's information content. As expected for a measure of complexity, this quantity increases during language model training. We find that overfit, badly-generalizing neighborhoods are smaller, indicating a more complex learned behavior. This smaller volume can also be interpreted in an MDL sense as suboptimal compression. Our results are consistent with a picture of generalization we call the "volume hypothesis": that neural net training produces good generalization primarily because the architecture gives simple functions more volume in parameter space, and the optimizer samples from the low-loss manifold in a volume-sensitive way. We believe that fast local-volume estimators are a promising practical metric of network complexity and architectural inductive bias for interpretability purposes.
Related papers
- Quantification via Gaussian Latent Space Representations [3.2198127675295036]
Quantification is the task of predicting the prevalence of each class within an unknown bag of examples.<n>We present an end-to-end neural network that uses Gaussian distributions in latent spaces to obtain invariant representations of bags of examples.
arXiv Detail & Related papers (2025-01-23T13:13:46Z) - Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - Sampling weights of deep neural networks [1.2370077627846041]
We introduce a probability distribution, combined with an efficient sampling algorithm, for weights and biases of fully-connected neural networks.
In a supervised learning context, no iterative optimization or gradient computations of internal network parameters are needed.
We prove that sampled networks are universal approximators.
arXiv Detail & Related papers (2023-06-29T10:13:36Z) - Improved uncertainty quantification for neural networks with Bayesian
last layer [0.0]
Uncertainty quantification is an important task in machine learning.
We present a reformulation of the log-marginal likelihood of a NN with BLL which allows for efficient training using backpropagation.
arXiv Detail & Related papers (2023-02-21T20:23:56Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Learning Optimal Flows for Non-Equilibrium Importance Sampling [13.469239537683299]
We develop a method to perform calculations based on generating samples from a simple base distribution, transporting them along the flow generated by a velocity field, and performing averages along these flowlines.
On the theory side we discuss how to tailor the velocity field to the target and establish general conditions under which the proposed estimator is a perfect estimator.
On the computational side we show how to use deep learning to represent the velocity field by a neural network and train it towards the zero variance optimum.
arXiv Detail & Related papers (2022-06-20T17:25:26Z) - Rethinking Spatial Invariance of Convolutional Networks for Object
Counting [119.83017534355842]
We try to use locally connected Gaussian kernels to replace the original convolution filter to estimate the spatial position in the density map.
Inspired by previous work, we propose a low-rank approximation accompanied with translation invariance to favorably implement the approximation of massive Gaussian convolution.
Our methods significantly outperform other state-of-the-art methods and achieve promising learning of the spatial position of objects.
arXiv Detail & Related papers (2022-06-10T17:51:25Z) - Learning Distributions by Generative Adversarial Networks: Approximation
and Generalization [0.6768558752130311]
We study how well generative adversarial networks learn from finite samples by analyzing the convergence rates of these models.
Our analysis is based on a new inequality oracle that decomposes the estimation error of GAN into the discriminator and generator approximation errors.
For generator approximation error, we show that neural network can approximately transform a low-dimensional source distribution to a high-dimensional target distribution.
arXiv Detail & Related papers (2022-05-25T09:26:17Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Robust Estimation for Nonparametric Families via Generative Adversarial
Networks [92.64483100338724]
We provide a framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems.
Our work extend these to robust mean estimation, second moment estimation, and robust linear regression.
In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2022-02-02T20:11:33Z) - Unrolling Particles: Unsupervised Learning of Sampling Distributions [102.72972137287728]
Particle filtering is used to compute good nonlinear estimates of complex systems.
We show in simulations that the resulting particle filter yields good estimates in a wide range of scenarios.
arXiv Detail & Related papers (2021-10-06T16:58:34Z) - Learning from Small Samples: Transformation-Invariant SVMs with
Composition and Locality at Multiple Scales [11.210266084524998]
This paper shows how to incorporate into support-vector machines (SVMs) those properties that have made convolutional neural networks (CNNs) successful.
arXiv Detail & Related papers (2021-09-27T04:02:43Z) - Probabilistic partition of unity networks: clustering based deep
approximation [0.0]
Partition of unity networks (POU-Nets) have been shown capable of realizing algebraic convergence rates for regression and solution of PDEs.
We enrich POU-Nets with a Gaussian noise model to obtain a probabilistic generalization amenable to gradient-based generalizations of a maximum likelihood loss.
We provide benchmarks quantifying performance in high/low-dimensions, demonstrating that convergence rates depend only on the latent dimension of data within high-dimensional space.
arXiv Detail & Related papers (2021-07-07T08:02:00Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - SignalNet: A Low Resolution Sinusoid Decomposition and Estimation
Network [79.04274563889548]
We propose SignalNet, a neural network architecture that detects the number of sinusoids and estimates their parameters from quantized in-phase and quadrature samples.
We introduce a worst-case learning threshold for comparing the results of our network relative to the underlying data distributions.
In simulation, we find that our algorithm is always able to surpass the threshold for three-bit data but often cannot exceed the threshold for one-bit data.
arXiv Detail & Related papers (2021-06-10T04:21:20Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - A Low-Complexity MIMO Channel Estimator with Implicit Structure of a
Convolutional Neural Network [0.0]
We propose a low-complexity convolutional neural network estimator which learns the minimum mean squared error channel estimator for single-antenna users.
We derive a high-level description of the estimator for arbitrary choices of the pilot sequence.
numerically results demonstrate performance gains compared to state-of-the-art algorithms.
arXiv Detail & Related papers (2021-04-26T15:52:29Z) - Learning Optical Flow from a Few Matches [67.83633948984954]
We show that the dense correlation volume representation is redundant and accurate flow estimation can be achieved with only a fraction of elements in it.
Experiments show that our method can reduce computational cost and memory use significantly, while maintaining high accuracy.
arXiv Detail & Related papers (2021-04-05T21:44:00Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Distributionally Robust Parametric Maximum Likelihood Estimation [13.09499764232737]
We propose a distributionally robust maximum likelihood estimator that minimizes the worst-case expected log-loss uniformly over a parametric nominal distribution.
Our novel robust estimator also enjoys statistical consistency and delivers promising empirical results in both regression and classification tasks.
arXiv Detail & Related papers (2020-10-11T19:05:49Z) - Learning Minimax Estimators via Online Learning [55.92459567732491]
We consider the problem of designing minimax estimators for estimating parameters of a probability distribution.
We construct an algorithm for finding a mixed-case Nash equilibrium.
arXiv Detail & Related papers (2020-06-19T22:49:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.