Related papers: Measuring Model Complexity of Neural Networks with Curve Activation Functions

Measuring Model Complexity of Neural Networks with Curve Activation Functions

URL: http://arxiv.org/abs/2006.08962v1
Date: Tue, 16 Jun 2020 07:38:06 GMT
Title: Measuring Model Complexity of Neural Networks with Curve Activation Functions
Authors: Xia Hu, Weiqing Liu, Jiang Bian, Jian Pei
Abstract summary: We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function. We experimentally explore the training process of neural networks and detect overfitting. We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
Score: 100.98319505253797
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It is fundamental to measure model complexity of deep neural networks. The existing literature on model complexity mainly focuses on neural networks with piecewise linear activation functions. Model complexity of neural networks with general curve activation functions remains an open problem. To tackle the challenge, in this paper, we first propose the linear approximation neural network (LANN for short), a piecewise linear framework to approximate a given deep model with curve activation function. LANN constructs individual piecewise linear approximation for the activation function of each neuron, and minimizes the number of linear regions to satisfy a required approximation degree. Then, we analyze the upper bound of the number of linear regions formed by LANNs, and derive the complexity measure based on the upper bound. To examine the usefulness of the complexity measure, we experimentally explore the training process of neural networks and detect overfitting. Our results demonstrate that the occurrence of overfitting is positively correlated with the increase of model complexity during training. We find that the $L^1$ and $L^2$ regularizations suppress the increase of model complexity. Finally, we propose two approaches to prevent overfitting by directly constraining model complexity, namely neuron pruning and customized $L^1$ regularization.

Related papers

Dynamical Decoupling of Generalization and Overfitting in Large Two-Layer Networks [12.061229162870513]
We study the training dynamics of two-layer neural networks. We find several new phenomena in the training dynamics. These include the emergence of a slow time scale associated with the growth in Gaussian/Rademacher complexity.
arXiv Detail & Related papers (2025-02-28T17:45:26Z)
On the Trade-off Between Efficiency and Precision of Neural Abstraction [62.046646433536104]
Neural abstractions have been recently introduced as formal approximations of complex, nonlinear dynamical models. We employ formal inductive synthesis procedures to generate neural abstractions that result in dynamical models with these semantics.
arXiv Detail & Related papers (2023-07-28T13:22:32Z)
Efficient SGD Neural Network Training via Sublinear Activated Neuron Identification [22.361338848134025]
We present a fully connected two-layer neural network for shifted ReLU activation to enable activated neuron identification in sublinear time via geometric search. We also prove that our algorithm can converge in $O(M2/epsilon2)$ time with network size quadratic in the coefficient norm upper bound $M$ and error term $epsilon$.
arXiv Detail & Related papers (2023-07-13T05:33:44Z)
Provable Identifiability of Two-Layer ReLU Neural Networks via LASSO Regularization [15.517787031620864]
The territory of LASSO is extended to two-layer ReLU neural networks, a fashionable and powerful nonlinear regression model. We show that the LASSO estimator can stably reconstruct the neural network and identify $mathcalSstar$ when the number of samples scales logarithmically. Our theory lies in an extended Restricted Isometry Property (RIP)-based analysis framework for two-layer ReLU neural networks.
arXiv Detail & Related papers (2023-05-07T13:05:09Z)
Simultaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations [2.15145758970292]
We derive the required depth, width, and sparsity of a deep neural network to approximate any H"older smooth function up to a given approximation error in H"older norms. The latter feature is essential to control generalization errors in many statistical and machine learning applications.
arXiv Detail & Related papers (2022-06-20T01:18:29Z)
Going Beyond Linear RL: Sample Efficient Neural Function Approximation [76.57464214864756]
We study function approximation with two-layer neural networks. Our results significantly improve upon what can be attained with linear (or eluder dimension) methods.
arXiv Detail & Related papers (2021-07-14T03:03:56Z)
Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature [61.22680308681648]
We show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward. For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOL)
arXiv Detail & Related papers (2021-02-08T12:41:56Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
Multipole Graph Neural Operator for Parametric Partial Differential Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data. We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity. Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.