NN2Poly: A polynomial representation for deep feed-forward artificial
neural networks
- URL: http://arxiv.org/abs/2112.11397v4
- Date: Mon, 25 Sep 2023 18:33:01 GMT
- Title: NN2Poly: A polynomial representation for deep feed-forward artificial
neural networks
- Authors: Pablo Morala (1 and 2), Jenny Alexandra Cifuentes (3), Rosa E. Lillo
(1 and 2), I\~naki Ucar (1 and 2) ((1) uc3m-Santander Big Data Institute,
Universidad Carlos III de Madrid. Spain., (2) Department of Statistics,
Universidad Carlos III de Madrid. Spain., (3) ICADE, Department of
Quantitative Methods, Faculty of Economics and Business Administration,
Universidad Pontificia Comillas. Spain.)
- Abstract summary: NN2Poly is a theoretical approach to obtain an explicit model of an already trained fully-connected feed-forward artificial neural network.
This approach extends a previous idea proposed in the literature, which was limited to single hidden layer networks.
- Score: 0.6502001911298337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interpretability of neural networks and their underlying theoretical behavior
remain an open field of study even after the great success of their practical
applications, particularly with the emergence of deep learning. In this work,
NN2Poly is proposed: a theoretical approach to obtain an explicit polynomial
model that provides an accurate representation of an already trained
fully-connected feed-forward artificial neural network (a multilayer perceptron
or MLP). This approach extends a previous idea proposed in the literature,
which was limited to single hidden layer networks, to work with arbitrarily
deep MLPs in both regression and classification tasks. NN2Poly uses a Taylor
expansion on the activation function, at each layer, and then applies several
combinatorial properties to calculate the coefficients of the desired
polynomials. Discussion is presented on the main computational challenges of
this method, and the way to overcome them by imposing certain constraints
during the training phase. Finally, simulation experiments as well as
applications to real tabular data sets are presented to demonstrate the
effectiveness of the proposed method.
Related papers
- Defining Neural Network Architecture through Polytope Structures of Dataset [53.512432492636236]
This paper defines upper and lower bounds for neural network widths, which are informed by the polytope structure of the dataset in question.
We develop an algorithm to investigate a converse situation where the polytope structure of a dataset can be inferred from its corresponding trained neural networks.
It is established that popular datasets such as MNIST, Fashion-MNIST, and CIFAR10 can be efficiently encapsulated using no more than two polytopes with a small number of faces.
arXiv Detail & Related papers (2024-02-04T08:57:42Z) - A practical existence theorem for reduced order models based on convolutional autoencoders [0.4604003661048266]
Deep learning has gained increasing popularity in the fields of Partial Differential Equations (PDEs) and Reduced Order Modeling (ROM)
CNN-based autoencoders have proven extremely effective, outperforming established techniques, such as the reduced basis method, when dealing with complex nonlinear problems.
We provide a new practical existence theorem for CNN-based autoencoders when the parameter-to-solution map is holomorphic.
arXiv Detail & Related papers (2024-02-01T09:01:58Z) - The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - [Experiments & Analysis] Evaluating the Feasibility of Sampling-Based Techniques for Training Multilayer Perceptrons [10.145355763143218]
Several sampling-based techniques have been proposed for speeding up the training time of deep neural networks.
These techniques fall under two categories: (i) sampling a subset of nodes in every hidden layer as active at every iteration and (ii) sampling a subset of nodes from the previous layer to approximate the current layer's activations.
In this paper, we evaluate the feasibility of these approaches on CPU machines with limited computational resources.
arXiv Detail & Related papers (2023-06-15T17:19:48Z) - When Deep Learning Meets Polyhedral Theory: A Survey [6.899761345257773]
In the past decade, deep became the prevalent methodology for predictive modeling thanks to the remarkable accuracy of deep neural learning.
Meanwhile, the structure of neural networks converged back to simplerwise and linear functions.
arXiv Detail & Related papers (2023-04-29T11:46:53Z) - Limitations of neural network training due to numerical instability of
backpropagation [2.255961793913651]
We study the training of deep neural networks by gradient descent where floating-point arithmetic is used to compute gradients.
It is highly unlikely to find ReLU neural networks that maintain, in the course of training with gradient descent, superlinearly many affine pieces with respect to their number of layers.
We conclude that approximating sequences of ReLU neural networks resulting from gradient descent in practice differ substantially from theoretically constructed sequences.
arXiv Detail & Related papers (2022-10-03T10:34:38Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Towards a mathematical framework to inform Neural Network modelling via
Polynomial Regression [0.0]
It is shown that almost identical predictions can be made when certain conditions are met locally.
When learning from generated data, the proposed method producess that approximate correctly the data locally.
arXiv Detail & Related papers (2021-02-07T17:56:16Z) - How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution.
Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.