Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality
- URL: http://arxiv.org/abs/2506.19144v1
- Date: Mon, 23 Jun 2025 21:29:40 GMT
- Title: Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality
- Authors: Kyeongwon Lee, Lizhen Lin, Jaewoo Park, Seonghyun Jeong,
- Abstract summary: This work establishes that sparse Bayesian neural networks achieve optimal posterior contraction rates over anisotropic Besov spaces and their hierarchical compositions.<n>We show that these priors enable rate adaptation, allowing the posterior to contract at the optimal rate even when the smoothness level of the true function is unknown.
- Score: 8.411295657303324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work establishes that sparse Bayesian neural networks achieve optimal posterior contraction rates over anisotropic Besov spaces and their hierarchical compositions. These structures reflect the intrinsic dimensionality of the underlying function, thereby mitigating the curse of dimensionality. Our analysis shows that Bayesian neural networks equipped with either sparse or continuous shrinkage priors attain the optimal rates which are dependent on the intrinsic dimension of the true structures. Moreover, we show that these priors enable rate adaptation, allowing the posterior to contract at the optimal rate even when the smoothness level of the true function is unknown. The proposed framework accommodates a broad class of functions, including additive and multiplicative Besov functions as special cases. These results advance the theoretical foundations of Bayesian neural networks and provide rigorous justification for their practical effectiveness in high-dimensional, structured estimation problems.
Related papers
- Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks [78.11734286268455]
We study the performance of ConvResNeXts, trained with weight decay from the perspective of nonparametric classification.<n>Our analysis allows for infinitely many building blocks in ConvResNeXts, and shows that weight decay implicitly enforces sparsity on these blocks.
arXiv Detail & Related papers (2023-07-04T11:08:03Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - TANGOS: Regularizing Tabular Neural Networks through Gradient
Orthogonalization and Specialization [69.80141512683254]
We introduce Tabular Neural Gradient Orthogonalization and gradient (TANGOS)
TANGOS is a novel framework for regularization in the tabular setting built on latent unit attributions.
We demonstrate that our approach can lead to improved out-of-sample generalization performance, outperforming other popular regularization methods.
arXiv Detail & Related papers (2023-03-09T18:57:13Z) - Optimal Approximation Complexity of High-Dimensional Functions with
Neural Networks [3.222802562733787]
We investigate properties of neural networks that use both ReLU and $x2$ as activation functions.
We show how to leverage low local dimensionality in some contexts to overcome the curse of dimensionality, obtaining approximation rates that are optimal for unknown lower-dimensional subspaces.
arXiv Detail & Related papers (2023-01-30T17:29:19Z) - Asymptotic Properties for Bayesian Neural Network in Besov Space [1.90365714903665]
We show that the Bayesian neural network using spike-and-slab prior consistency has nearly minimax convergence rate when the true regression function is in the Besov space.
We propose a practical neural network with guaranteed properties.
arXiv Detail & Related papers (2022-06-01T05:47:06Z) - Pricing options on flow forwards by neural networks in Hilbert space [0.0]
We recast the pricing problem as an optimization problem in a Hilbert space of real-valued function on the positive real line.
This optimization problem is solved by facilitating a novel feedforward neural network architecture.
arXiv Detail & Related papers (2022-02-17T18:03:51Z) - Bayesian neural network priors for edge-preserving inversion [3.2046720177804646]
A class of prior distributions based on the output of neural networks with heavy-tailed weights is introduced.
We show theoretically that samples from such priors have desirable discontinuous-like properties even when the network width is finite.
arXiv Detail & Related papers (2021-12-20T16:39:05Z) - Layer Adaptive Node Selection in Bayesian Neural Networks: Statistical
Guarantees and Implementation Details [0.5156484100374059]
Sparse deep neural networks have proven to be efficient for predictive model building in large-scale studies.
We propose a Bayesian sparse solution using spike-and-slab Gaussian priors to allow for node selection during training.
We establish the fundamental result of variational posterior consistency together with the characterization of prior parameters.
arXiv Detail & Related papers (2021-08-25T00:48:07Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.