Related papers: A unified Fourier slice method to derive ridgelet transform for a variety of depth-2 neural networks

A unified Fourier slice method to derive ridgelet transform for a variety of depth-2 neural networks

URL: http://arxiv.org/abs/2402.15984v2
Date: Thu, 18 Apr 2024 19:10:58 GMT
Title: A unified Fourier slice method to derive ridgelet transform for a variety of depth-2 neural networks
Authors: Sho Sonoda, Isao Ishikawa, Masahiro Ikeda,
Abstract summary: The ridgelet transform is a pseudo-inverse operator that maps a given function $f$ to the parameter distribution $gamma$. For depth-2 fully-connected networks on a Euclidean space, the ridgelet transform has been discovered up to the closed-form expression. We derive transforms for a variety of modern networks such as networks on finite fields $mathbbF_p$, group convolutional networks on abstract Hilbert space $mathcalH$, fully-connected networks on noncompact symmetric spaces $G/K$, and pooling layers.
Score: 14.45619075342763
License: http://creativecommons.org/licenses/by/4.0/
Abstract: To investigate neural network parameters, it is easier to study the distribution of parameters than to study the parameters in each neuron. The ridgelet transform is a pseudo-inverse operator that maps a given function $f$ to the parameter distribution $\gamma$ so that a network $\mathtt{NN}[\gamma]$ reproduces $f$, i.e. $\mathtt{NN}[\gamma]=f$. For depth-2 fully-connected networks on a Euclidean space, the ridgelet transform has been discovered up to the closed-form expression, thus we could describe how the parameters are distributed. However, for a variety of modern neural network architectures, the closed-form expression has not been known. In this paper, we explain a systematic method using Fourier expressions to derive ridgelet transforms for a variety of modern networks such as networks on finite fields $\mathbb{F}_p$, group convolutional networks on abstract Hilbert space $\mathcal{H}$, fully-connected networks on noncompact symmetric spaces $G/K$, and pooling layers, or the $d$-plane ridgelet transform.

Related papers

Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks. In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z)
Fourier Circuits in Neural Networks and Transformers: A Case Study of Modular Arithmetic with Multiple Inputs [35.212818841550835]
One-hidden layer neural networks and one-layer Transformers are studied. One-hidden layer neural networks attain a maximum $ L_2,k+1 $-margin on a dataset. We observe similar computational mechanisms in attention of one-layer Transformers.
arXiv Detail & Related papers (2024-02-12T05:52:06Z)
Affine Invariance in Continuous-Domain Convolutional Neural Networks [5.095097384893417]
Group invariance helps neural networks in recognizing patterns and features under geometric transformations. This research studies affine invariance on continuous-domain convolutional neural networks. Our research could eventually extend the scope of geometrical transformations that usual deep-learning pipelines can handle.
arXiv Detail & Related papers (2023-11-13T14:17:57Z)
The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$. We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z)
Shallow neural network representation of polynomials [91.3755431537592]
We show that $d$-variables of degreeR$ can be represented on $[0,1]d$ as shallow neural networks of width $d+1+sum_r=2Rbinomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1d-1[binomr+d-1d-1d-1d-1
arXiv Detail & Related papers (2022-08-17T08:14:52Z)
Universality of group convolutional neural networks based on ridgelet analysis on groups [10.05944106581306]
We investigate the approximation property of group convolutional neural networks (GCNNs) based on the ridgelet theory. We formulate a versatile GCNN as a nonlinear mapping between group representations.
arXiv Detail & Related papers (2022-05-30T02:52:22Z)
On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons. Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z)
Function approximation by deep neural networks with parameters $\{0,\pm \frac{1}{2}, \pm 1, 2\}$ [91.3755431537592]
It is shown that $C_beta$-smooth functions can be approximated by neural networks with parameters $0,pm frac12, pm 1, 2$. The depth, width and the number of active parameters of constructed networks have, up to a logarithimc factor, the same dependence on the approximation error as the networks with parameters in $[-1,1]$.
arXiv Detail & Related papers (2021-03-15T19:10:02Z)
Sample Complexity and Overparameterization Bounds for Projection-Free Neural TD Learning [38.730333068555275]
Existing analysis of neural TD learning relies on either infinite width-analysis or constraining the network parameters in a (random) compact set. We show that the projection-free TD learning equipped with a two-layer ReLU network of any width exceeding $poly(overlinenu,1/epsilon)$ converges to the true value function with error $epsilon$ given $poly(overlinenu,1/epsilon)$ iterations or samples.
arXiv Detail & Related papers (2021-03-02T01:05:19Z)
On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics [8.160343645537106]
We develop Banach spaces for ReLU neural networks of finite depth $L$ and infinite width. The spaces contain all finite fully connected $L$-layer networks and their $L2$-limiting objects under on the natural path-norm. Under this norm, the unit ball in the space for $L$-layer networks has low Rademacher complexity and thus favorable properties.
arXiv Detail & Related papers (2020-07-30T17:47:05Z)
Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-layer Networks [70.15611146583068]
We develop exact representations of training two-layer neural networks with rectified linear units (ReLUs) Our theory utilizes semi-infinite duality and minimum norm regularization.
arXiv Detail & Related papers (2020-02-24T21:32:41Z)
A Corrective View of Neural Networks: Representation, Memorization and Learning [26.87238691716307]
We develop a corrective mechanism for neural network approximation. We show that two-layer neural networks in the random features regime (RF) can memorize arbitrary labels. We also consider three-layer neural networks and show that the corrective mechanism yields faster representation rates for smooth radial functions.
arXiv Detail & Related papers (2020-02-01T20:51:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.