Related papers: Towards Lower Bounds on the Depth of ReLU Neural Networks

Towards Lower Bounds on the Depth of ReLU Neural Networks

URL: http://arxiv.org/abs/2105.14835v5
Date: Wed, 17 Jul 2024 16:15:49 GMT
Title: Towards Lower Bounds on the Depth of ReLU Neural Networks
Authors: Christoph Hertrich, Amitabh Basu, Marco Di Summa, Martin Skutella,
Abstract summary: We investigate whether the class of exactly representable functions strictly increases by adding more layers. We settle an old conjecture about piecewise linear functions by Wang and Sun (2005) in the affirmative. We present upper bounds on the sizes of neural networks required to represent functions with logarithmic depth.
Score: 7.355977594790584
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We contribute to a better understanding of the class of functions that can be represented by a neural network with ReLU activations and a given architecture. Using techniques from mixed-integer optimization, polyhedral theory, and tropical geometry, we provide a mathematical counterbalance to the universal approximation theorems which suggest that a single hidden layer is sufficient for learning any function. In particular, we investigate whether the class of exactly representable functions strictly increases by adding more layers (with no restrictions on size). As a by-product of our investigations, we settle an old conjecture about piecewise linear functions by Wang and Sun (2005) in the affirmative. We also present upper bounds on the sizes of neural networks required to represent functions with logarithmic depth.

Related papers

Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework. We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values. This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z)
Neural Scaling Laws of Deep ReLU and Deep Operator Network: A Theoretical Study [8.183509993010983]
We study the neural scaling laws for deep operator networks using the Chen and Chen style architecture. We quantify the neural scaling laws by analyzing its approximation and generalization errors. Our results offer a partial explanation of the neural scaling laws in operator learning and provide a theoretical foundation for their applications.
arXiv Detail & Related papers (2024-10-01T03:06:55Z)
Data Topology-Dependent Upper Bounds of Neural Network Widths [52.58441144171022]
We first show that a three-layer neural network can be designed to approximate an indicator function over a compact set. This is then extended to a simplicial complex, deriving width upper bounds based on its topological structure. We prove the universal approximation property of three-layer ReLU networks using our topological approach.
arXiv Detail & Related papers (2023-05-25T14:17:15Z)
A Unified Algebraic Perspective on Lipschitz Neural Networks [88.14073994459586]
This paper introduces a novel perspective unifying various types of 1-Lipschitz neural networks. We show that many existing techniques can be derived and generalized via finding analytical solutions of a common semidefinite programming (SDP) condition. Our approach, called SDP-based Lipschitz Layers (SLL), allows us to design non-trivial yet efficient generalization of convex potential layers.
arXiv Detail & Related papers (2023-03-06T14:31:09Z)
Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint [48.25573695787407]
We prove that large ConvResNets can not only approximate a target function in terms of function value, but also exhibit sufficient first-order smoothness. Our theory partially justifies the benefits of using deep and wide networks in practice.
arXiv Detail & Related papers (2022-06-09T15:35:22Z)
Optimal Learning Rates of Deep Convolutional Neural Networks: Additive Ridge Functions [19.762318115851617]
We consider the mean squared error analysis for deep convolutional neural networks. We show that, for additive ridge functions, convolutional neural networks followed by one fully connected layer with ReLU activation functions can reach optimal mini-max rates.
arXiv Detail & Related papers (2022-02-24T14:22:32Z)
Neural networks with linear threshold activations: structure and algorithms [1.795561427808824]
We show that 2 hidden layers are necessary and sufficient to represent any function representable in the class. We also give precise bounds on the sizes of the neural networks required to represent any function in the class. We propose a new class of neural networks that we call shortcut linear threshold networks.
arXiv Detail & Related papers (2021-11-15T22:33:52Z)
Optimal Approximation with Sparse Neural Networks and Applications [0.0]
We use deep sparsely connected neural networks to measure the complexity of a function class in $L(mathbb Rd)$. We also introduce representation system - a countable collection of functions to guide neural networks. We then analyse the complexity of a class called $beta$ cartoon-like functions using rate-distortion theory and wedgelets construction.
arXiv Detail & Related papers (2021-08-14T05:14:13Z)
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning. In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function. Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z)
Theory of Deep Convolutional Neural Networks II: Spherical Analysis [9.099589602551573]
We consider a family of deep convolutional neural networks applied to approximate functions on the unit sphere $mathbbSd-1$ of $mathbbRd$. Our analysis presents rates of uniform approximation when the approximated function lies in the Sobolev space $Wr_infty (mathbbSd-1)$ with $r>0$ or takes an additive ridge form.
arXiv Detail & Related papers (2020-07-28T14:54:30Z)
Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory [110.99247009159726]
Temporal-difference and Q-learning play a key role in deep reinforcement learning, where they are empowered by expressive nonlinear function approximators such as neural networks. In particular, temporal-difference learning converges when the function approximator is linear in a feature representation, which is fixed throughout learning, and possibly diverges otherwise.
arXiv Detail & Related papers (2020-06-08T17:25:22Z)
Geometrically Principled Connections in Graph Neural Networks [66.51286736506658]
We argue geometry should remain the primary driving force behind innovation in the emerging field of geometric deep learning. We relate graph neural networks to widely successful computer graphics and data approximation models: radial basis functions (RBFs) We introduce affine skip connections, a novel building block formed by combining a fully connected layer with any graph convolution operator.
arXiv Detail & Related papers (2020-04-06T13:25:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.