Optimized classification with neural ODEs via separability
- URL: http://arxiv.org/abs/2312.13807v1
- Date: Thu, 21 Dec 2023 12:56:40 GMT
- Title: Optimized classification with neural ODEs via separability
- Authors: Antonio \'Alvarez-L\'opez, Rafael Orive-Illera, Enrique Zuazua
- Abstract summary: Classification of $N$ points becomes a simultaneous control problem when viewed through the lens of neural ordinary differential equations (neural ODEs)
In this study, we focus on estimating the number of neurons required for efficient cluster-based classification.
We propose a new constructive algorithm that simultaneously classifies clusters of $d$ points from any initial configuration.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classification of $N$ points becomes a simultaneous control problem when
viewed through the lens of neural ordinary differential equations (neural
ODEs), which represent the time-continuous limit of residual networks. For the
narrow model, with one neuron per hidden layer, it has been shown that the task
can be achieved using $O(N)$ neurons. In this study, we focus on estimating the
number of neurons required for efficient cluster-based classification,
particularly in the worst-case scenario where points are independently and
uniformly distributed in $[0,1]^d$. Our analysis provides a novel method for
quantifying the probability of requiring fewer than $O(N)$ neurons, emphasizing
the asymptotic behavior as both $d$ and $N$ increase. Additionally, under the
sole assumption that the data are in general position, we propose a new
constructive algorithm that simultaneously classifies clusters of $d$ points
from any initial configuration, effectively reducing the maximal complexity to
$O(N/d)$ neurons.
Related papers
- Dimension-independent learning rates for high-dimensional classification
problems [53.622581586464634]
We show that every $RBV2$ function can be approximated by a neural network with bounded weights.
We then prove the existence of a neural network with bounded weights approximating a classification function.
arXiv Detail & Related papers (2024-09-26T16:02:13Z) - Minimum number of neurons in fully connected layers of a given neural network (the first approximation) [0.0]
The paper presents an algorithm for searching for the minimum number of neurons in fully connected layers of an arbitrary network solving given problem.
The proposed algorithm is the first approximation for estimating the minimum number of neurons in the layer, since, on the one hand, the algorithm does not guarantee that a neural network with the found number of neurons can be trained to the required quality.
arXiv Detail & Related papers (2024-05-23T03:46:07Z) - Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent [83.85536329832722]
We show that gradient descent (SGD) can efficiently solve the $k$-parity problem on a $d$dimensional hypercube.
We then demonstrate how a trained neural network with SGD, solving the $k$-parity problem with small statistical errors.
arXiv Detail & Related papers (2024-04-18T17:57:53Z) - First Steps Towards a Runtime Analysis of Neuroevolution [2.07180164747172]
We consider a simple setting in neuroevolution where an evolutionary algorithm optimize the weights and activation functions of a simple artificial neural network.
We then define simple example functions to be learned by the network and conduct rigorous runtime analyses for networks with a single neuron and for a more advanced structure with several neurons and two layers.
Our results show that the proposed algorithm is generally efficient on two example problems designed for one neuron and efficient with at least constant probability on the example problem for a two-layer network.
arXiv Detail & Related papers (2023-07-03T07:30:58Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Fundamental tradeoffs between memorization and robustness in random
features and neural tangent regimes [15.76663241036412]
We prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded.
Experiments reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.
arXiv Detail & Related papers (2021-06-04T17:52:50Z) - The Efficacy of $L_1$ Regularization in Two-Layer Neural Networks [36.753907384994704]
A crucial problem in neural networks is to select the most appropriate number of hidden neurons and obtain tight statistical risk bounds.
We show that $L_1$ regularization can control the generalization error and sparsify the input dimension.
An excessively large number of neurons do not necessarily inflate generalization errors under a suitable regularization.
arXiv Detail & Related papers (2020-10-02T15:23:22Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z) - Measuring Model Complexity of Neural Networks with Curve Activation
Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function.
We experimentally explore the training process of neural networks and detect overfitting.
We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.