Related papers: Bounding The Number of Linear Regions in Local Area for Neural Networks with ReLU Activations

Bounding The Number of Linear Regions in Local Area for Neural Networks with ReLU Activations

URL: http://arxiv.org/abs/2007.06803v1
Date: Tue, 14 Jul 2020 04:06:00 GMT
Title: Bounding The Number of Linear Regions in Local Area for Neural Networks with ReLU Activations
Authors: Rui Zhu, Bo Lin, Haixu Tang
Abstract summary: We present the first method to estimate the upper bound of the number of linear regions in any sphere in the input space of a given ReLU neural network. Our experiments showed that, while training a neural network, the boundaries of the linear regions tend to move away from the training data points.
Score: 6.4817648240626005
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The number of linear regions is one of the distinct properties of the neural networks using piecewise linear activation functions such as ReLU, comparing with those conventional ones using other activation functions. Previous studies showed this property reflected the expressivity of a neural network family ([14]); as a result, it can be used to characterize how the structural complexity of a neural network model affects the function it aims to compute. Nonetheless, it is challenging to directly compute the number of linear regions; therefore, many researchers focus on estimating the bounds (in particular the upper bound) of the number of linear regions for deep neural networks using ReLU. These methods, however, attempted to estimate the upper bound in the entire input space. The theoretical methods are still lacking to estimate the number of linear regions within a specific area of the input space, e.g., a sphere centered at a training data point such as an adversarial example or a backdoor trigger. In this paper, we present the first method to estimate the upper bound of the number of linear regions in any sphere in the input space of a given ReLU neural network. We implemented the method, and computed the bounds in deep neural networks using the piece-wise linear active function. Our experiments showed that, while training a neural network, the boundaries of the linear regions tend to move away from the training data points. In addition, we observe that the spheres centered at the training data points tend to contain more linear regions than any arbitrary points in the input space. To the best of our knowledge, this is the first study of bounding linear regions around a specific data point. We consider our work as a first step toward the investigation of the structural complexity of deep neural networks in a specific input area.

Related papers

Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations [54.17275171325324]
We present a counterexample to the Linear Representation Hypothesis (LRH) When trained to repeat an input token sequence, neural networks learn to represent the token at each position with a particular order of magnitude, rather than a direction. These findings strongly indicate that interpretability research should not be confined to the LRH.
arXiv Detail & Related papers (2024-08-20T15:04:37Z)
The Evolution of the Interplay Between Input Distributions and Linear Regions in Networks [20.97553518108504]
We count the number of linear convex regions in deep neural networks based on ReLU. In particular, we prove that for any one-dimensional input, there exists a minimum threshold for the number of neurons required to express it. We also unveil the iterative refinement process of decision boundaries in ReLU networks during training.
arXiv Detail & Related papers (2023-10-28T15:04:53Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
Effects of Data Geometry in Early Deep Learning [16.967930721746672]
Deep neural networks can approximate functions on different types of data, from images to graphs, with varied underlying structure. We study how a randomly neural network with piece-wise linear activation splits the data manifold into regions where the neural network behaves as a linear function.
arXiv Detail & Related papers (2022-12-29T17:32:05Z)
Exploring Linear Feature Disentanglement For Neural Networks [63.20827189693117]
Non-linear activation functions, e.g., Sigmoid, ReLU, and Tanh, have achieved great success in neural networks (NNs) Due to the complex non-linear characteristic of samples, the objective of those activation functions is to project samples from their original feature space to a linear separable feature space. This phenomenon ignites our interest in exploring whether all features need to be transformed by all non-linear functions in current typical NNs.
arXiv Detail & Related papers (2022-03-22T13:09:17Z)
DISCO Verification: Division of Input Space into COnvex polytopes for neural network verification [0.0]
The impressive results of modern neural networks partly come from their non linear behaviour. We propose a method to simplify the verification problem by operating a partitionning into multiple linear subproblems. We also present the impact of a technique aiming at reducing the number of linear regions during training.
arXiv Detail & Related papers (2021-05-17T12:40:51Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
Measuring Model Complexity of Neural Networks with Curve Activation Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function. We experimentally explore the training process of neural networks and detect overfitting. We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z)
Piecewise linear activations substantially shape the loss surfaces of neural networks [95.73230376153872]
This paper presents how piecewise linear activation functions substantially shape the loss surfaces of neural networks. We first prove that it the loss surfaces of many neural networks have infinite spurious local minima which are defined as the local minima with higher empirical risks than the global minima. For one-hidden-layer networks, we prove that all local minima in a cell constitute an equivalence class; they are concentrated in a valley; and they are all global minima in the cell.
arXiv Detail & Related papers (2020-03-27T04:59:34Z)
Empirical Studies on the Properties of Linear Regions in Deep Neural Networks [34.08593191989188]
A deep neural network (DNN) with piecewise linear activations can partition the input space into numerous small linear regions. It is believed that the number of these regions represents the expressivity of the DNN. We study their local properties, such as the inspheres, the directions of the corresponding hyperplanes, the decision boundaries, and the relevance of the surrounding regions.
arXiv Detail & Related papers (2020-01-04T12:47:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.