Function Forms of Simple ReLU Networks with Random Hidden Weights
- URL: http://arxiv.org/abs/2505.17907v1
- Date: Fri, 23 May 2025 13:53:02 GMT
- Title: Function Forms of Simple ReLU Networks with Random Hidden Weights
- Authors: Ka Long Keith Ho, Yoshinari Takeishi, Junichi Takeuchi,
- Abstract summary: We investigate the function space dynamics of a two-layer ReLU neural network in the infinite-width limit.<n>We highlight the Fisher information matrix's role in steering learning.<n>This work offers a robust foundation for understanding wide neural networks.
- Score: 1.2289361708127877
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate the function space dynamics of a two-layer ReLU neural network in the infinite-width limit, highlighting the Fisher information matrix (FIM)'s role in steering learning. Extending seminal works on approximate eigendecomposition of the FIM, we derive the asymptotic behavior of basis functions ($f_v(x) = X^{\top} v $) for four groups of approximate eigenvectors, showing their convergence to distinct function forms. These functions, prioritized by gradient descent, exhibit FIM-induced inner products that approximate orthogonality in the function space, forging a novel connection between parameter and function spaces. Simulations validate the accuracy of these theoretical approximations, confirming their practical relevance. By refining the function space inner product's role, we advance the theoretical framework for ReLU networks, illuminating their optimization and expressivity. Overall, this work offers a robust foundation for understanding wide neural networks and enhances insights into scalable deep learning architectures, paving the way for improved design and analysis of neural networks.
Related papers
- Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework.<n>We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values.<n>This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z) - Extension of Symmetrized Neural Network Operators with Fractional and Mixed Activation Functions [0.0]
We propose a novel extension to symmetrized neural network operators by incorporating fractional and mixed activation functions.<n>Our framework introduces a fractional exponent in the activation functions, allowing adaptive non-linear approximations with improved accuracy.
arXiv Detail & Related papers (2025-01-17T14:24:25Z) - Approximation of RKHS Functionals by Neural Networks [30.42446856477086]
We study the approximation of functionals on kernel reproducing Hilbert spaces (RKHS's) using neural networks.
We derive explicit error bounds for those induced by inverse multiquadric, Gaussian, and Sobolev kernels.
We apply our findings to functional regression, proving that neural networks can accurately approximate the regression maps.
arXiv Detail & Related papers (2024-03-18T18:58:23Z) - Nonlinear functional regression by functional deep neural network with kernel embedding [18.927592350748682]
We introduce a functional deep neural network with an adaptive and discretization-invariant dimension reduction method.<n>Explicit rates of approximating nonlinear smooth functionals across various input function spaces are derived.<n>We conduct numerical experiments on both simulated and real datasets to demonstrate the effectiveness and benefits of our functional net.
arXiv Detail & Related papers (2024-01-05T16:43:39Z) - Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
We present Layer-wise Feedback Propagation (LFP), a novel training principle for neural network-like predictors.<n>LFP decomposes a reward to individual neurons based on their respective contributions to solving a given task.<n>Our method then implements a greedy approach reinforcing helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Optimal Approximation Complexity of High-Dimensional Functions with
Neural Networks [3.222802562733787]
We investigate properties of neural networks that use both ReLU and $x2$ as activation functions.
We show how to leverage low local dimensionality in some contexts to overcome the curse of dimensionality, obtaining approximation rates that are optimal for unknown lower-dimensional subspaces.
arXiv Detail & Related papers (2023-01-30T17:29:19Z) - Offline Reinforcement Learning with Differentiable Function
Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications.
We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA)
Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z) - Modern Non-Linear Function-on-Function Regression [8.231050911072755]
We introduce a new class of non-linear function-on-function regression models for functional data using neural networks.
We give two model fitting strategies, Functional Direct Neural Network (FDNN) and Functional Basis Neural Network (FBNN)
arXiv Detail & Related papers (2021-07-29T16:19:59Z) - Going Beyond Linear RL: Sample Efficient Neural Function Approximation [76.57464214864756]
We study function approximation with two-layer neural networks.
Our results significantly improve upon what can be attained with linear (or eluder dimension) methods.
arXiv Detail & Related papers (2021-07-14T03:03:56Z) - UNIPoint: Universally Approximating Point Processes Intensities [125.08205865536577]
We provide a proof that a class of learnable functions can universally approximate any valid intensity function.
We implement UNIPoint, a novel neural point process model, using recurrent neural networks to parameterise sums of basis function upon each event.
arXiv Detail & Related papers (2020-07-28T09:31:56Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.