Related papers: Expressivity and Approximation Properties of Deep Neural Networks with ReLU$^k$ Activation

Expressivity and Approximation Properties of Deep Neural Networks with ReLU$^k$ Activation

URL: http://arxiv.org/abs/2312.16483v2
Date: Thu, 11 Jan 2024 04:48:47 GMT
Title: Expressivity and Approximation Properties of Deep Neural Networks with ReLU$^k$ Activation
Authors: Juncai He, Tong Mao, Jinchao Xu
Abstract summary: We investigate the expressivity and approximation properties of deep networks employing the ReLU$k$ activation function for $k geq 2$. Although deep ReLU$k$ networks can approximates effectively, deep ReLU$k$ networks have the capability to represent higher-degrees precisely.
Score: 2.3020018305241337
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we investigate the expressivity and approximation properties of deep neural networks employing the ReLU$^k$ activation function for $k \geq 2$. Although deep ReLU networks can approximate polynomials effectively, deep ReLU$^k$ networks have the capability to represent higher-degree polynomials precisely. Our initial contribution is a comprehensive, constructive proof for polynomial representation using deep ReLU$^k$ networks. This allows us to establish an upper bound on both the size and count of network parameters. Consequently, we are able to demonstrate a suboptimal approximation rate for functions from Sobolev spaces as well as for analytic functions. Additionally, through an exploration of the representation power of deep ReLU$^k$ networks for shallow networks, we reveal that deep ReLU$^k$ networks can approximate functions from a range of variation spaces, extending beyond those generated solely by the ReLU$^k$ activation function. This finding demonstrates the adaptability of deep ReLU$^k$ networks in approximating functions within various variation spaces.

Related papers

Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework. We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values. This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z)
Compelling ReLU Networks to Exhibit Exponentially Many Linear Regions at Initialization and During Training [1.7205106391379021]
In a neural network with ReLULU activations, the number of piecewise linear regions in the output can grow exponentially with depth. We introduce a novel parameterization of the network that restricts the network that restricts its weights to its regions throughout training. This approach allows us to learn approximations of convex convex functions that are several orders of magnitude more accurate than their randomly counterparts.
arXiv Detail & Related papers (2023-11-29T19:09:48Z)
Piecewise Linear Functions Representable with Infinite Width Shallow ReLU Neural Networks [0.0]
We prove a conjecture of Ongie et al. that every continuous piecewise linear function expressible with this kind of infinite width neural network is expressible as a finite width shallow ReLU neural network.
arXiv Detail & Related papers (2023-07-25T15:38:18Z)
Polynomial Width is Sufficient for Set Representation with High-dimensional Features [69.65698500919869]
DeepSets is the most widely used neural network architecture for set representation. We present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE)
arXiv Detail & Related papers (2023-07-08T16:00:59Z)
Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory. We show that linear networks make provably optimal predictions at infinite depth. We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z)
Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint [48.25573695787407]
We prove that large ConvResNets can not only approximate a target function in terms of function value, but also exhibit sufficient first-order smoothness. Our theory partially justifies the benefits of using deep and wide networks in practice.
arXiv Detail & Related papers (2022-06-09T15:35:22Z)
Most Activation Functions Can Win the Lottery Without Excessive Depth [6.68999512375737]
Lottery ticket hypothesis has highlighted the potential for training deep neural networks by pruning. For networks with ReLU activation functions, it has been proven that a target network with depth $L$ can be approximated by the subnetwork of a randomly neural network that has double the target's depth $2L$ and is wider by a logarithmic factor.
arXiv Detail & Related papers (2022-05-04T20:51:30Z)
An Embedding of ReLU Networks and an Analysis of their Identifiability [5.076419064097734]
This paper introduces an embedding for ReLU neural networks of any depth, $Phi(theta)$, that is invariant to scalings. We derive some conditions under which a deep ReLU network is indeed locally identifiable from the knowledge of the realization.
arXiv Detail & Related papers (2021-07-20T09:43:31Z)
Theory of Deep Convolutional Neural Networks III: Approximating Radial Functions [7.943024117353317]
We consider a family of deep neural networks consisting of two groups of convolutional layers, a down operator, and a fully connected layer. The network structure depends on two structural parameters which determine the numbers of convolutional layers and the width of the fully connected layer.
arXiv Detail & Related papers (2021-07-02T08:22:12Z)
Adversarial Examples in Multi-Layer Random ReLU Networks [39.797621513256026]
adversarial examples arise in ReLU networks with independent gaussian parameters. Bottleneck layers in the network play a key role: the minimal width up to some point determines scales and sensitivities of mappings computed up to that point.
arXiv Detail & Related papers (2021-06-23T18:16:34Z)
Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions. We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z)
Size and Depth Separation in Approximating Natural Functions with Neural Networks [52.73592689730044]
We show the benefits of size and depth for approximation of natural functions with ReLU networks. We show a complexity-theoretic barrier to proving such results beyond size $O(d)$. We also show an explicit natural function, that can be approximated with networks of size $O(d)$.
arXiv Detail & Related papers (2021-01-30T21:30:11Z)
Deep Polynomial Neural Networks [77.70761658507507]
$Pi$Nets are a new class of function approximators based on expansions. $Pi$Nets produce state-the-art results in three challenging tasks, i.e. image generation, face verification and 3D mesh representation learning.
arXiv Detail & Related papers (2020-06-20T16:23:32Z)
Sharp Representation Theorems for ReLU Networks with Precise Dependence on Depth [26.87238691716307]
We prove sharp-free representation results for neural networks with $D$ ReLU layers under square loss. Our results confirm the prevailing hypothesis that deeper networks are better at representing less smooth functions.
arXiv Detail & Related papers (2020-06-07T05:25:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.