Towards Optimal Neural Networks: the Role of Sample Splitting in
Hyperparameter Selection
- URL: http://arxiv.org/abs/2307.07726v2
- Date: Thu, 5 Oct 2023 10:00:30 GMT
- Title: Towards Optimal Neural Networks: the Role of Sample Splitting in
Hyperparameter Selection
- Authors: Shijin Gong and Xinyu Zhang
- Abstract summary: We construct a novel theory for understanding the effectiveness of neural networks.
Specifically, we explore the rationale underlying a common practice during the construction of neural network models.
- Score: 10.083181657981292
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When artificial neural networks have demonstrated exceptional practical
success in a variety of domains, investigations into their theoretical
characteristics, such as their approximation power, statistical properties, and
generalization performance, have concurrently made significant strides. In this
paper, we construct a novel theory for understanding the effectiveness of
neural networks, which offers a perspective distinct from prior research.
Specifically, we explore the rationale underlying a common practice during the
construction of neural network models: sample splitting. Our findings indicate
that the optimal hyperparameters derived from sample splitting can enable a
neural network model that asymptotically minimizes the prediction risk. We
conduct extensive experiments across different application scenarios and
network architectures, and the results manifest our theory's effectiveness.
Related papers
- Expressivity of Neural Networks with Random Weights and Learned Biases [44.02417750529102]
Recent work has pushed the bounds of universal approximation by showing that arbitrary functions can similarly be learned by tuning smaller subsets of parameters.
We provide theoretical and numerical evidence demonstrating that feedforward neural networks with fixed random weights can be trained to perform multiple tasks by learning biases only.
Our results are relevant to neuroscience, where they demonstrate the potential for behaviourally relevant changes in dynamics without modifying synaptic weights.
arXiv Detail & Related papers (2024-07-01T04:25:49Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Generalization and Estimation Error Bounds for Model-based Neural
Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks.
We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z) - Statistical Physics of Deep Neural Networks: Initialization toward
Optimal Channels [6.144858413112823]
In deep learning, neural networks serve as noisy channels between input data and its representation.
We study a frequently overlooked possibility that neural networks can be intrinsic toward optimal channels.
arXiv Detail & Related papers (2022-12-04T05:13:01Z) - On the Approximation and Complexity of Deep Neural Networks to Invariant
Functions [0.0]
We study the approximation and complexity of deep neural networks to invariant functions.
We show that a broad range of invariant functions can be approximated by various types of neural network models.
We provide a feasible application that connects the parameter estimation and forecasting of high-resolution signals with our theoretical conclusions.
arXiv Detail & Related papers (2022-10-27T09:19:19Z) - Stochastic Neural Networks with Infinite Width are Deterministic [7.07065078444922]
We study neural networks, a main type of neural network in use.
We prove that as the width of an optimized neural network tends to infinity, its predictive variance on the training set decreases to zero.
arXiv Detail & Related papers (2022-01-30T04:52:31Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - Formalizing Generalization and Robustness of Neural Networks to Weight
Perturbations [58.731070632586594]
We provide the first formal analysis for feed-forward neural networks with non-negative monotone activation functions against weight perturbations.
We also design a new theory-driven loss function for training generalizable and robust neural networks against weight perturbations.
arXiv Detail & Related papers (2021-03-03T06:17:03Z) - Topological Insights into Sparse Neural Networks [16.515620374178535]
We introduce an approach to understand and compare sparse neural network topologies from the perspective of graph theory.
We first propose Neural Network Sparse Topology Distance (NNSTD) to measure the distance between different sparse neural networks.
We show that adaptive sparse connectivity can always unveil a plenitude of sparse sub-networks with very different topologies which outperform the dense model.
arXiv Detail & Related papers (2020-06-24T22:27:21Z) - Network Diffusions via Neural Mean-Field Dynamics [52.091487866968286]
We propose a novel learning framework for inference and estimation problems of diffusion on networks.
Our framework is derived from the Mori-Zwanzig formalism to obtain an exact evolution of the node infection probabilities.
Our approach is versatile and robust to variations of the underlying diffusion network models.
arXiv Detail & Related papers (2020-06-16T18:45:20Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.