Injectivity capacity of ReLU gates
- URL: http://arxiv.org/abs/2410.20646v1
- Date: Mon, 28 Oct 2024 00:57:10 GMT
- Title: Injectivity capacity of ReLU gates
- Authors: Mihailo Stojnic,
- Abstract summary: We consider the injectivity property of the ReLU networks layers.
We develop a powerful program to handle the $ell_0$ spherical perceptron and implicitly the ReLU layers injectivity.
The obtained results are also shown to fairly closely match the replica predictions from [40]
- Score: 0.0
- License:
- Abstract: We consider the injectivity property of the ReLU networks layers. Determining the ReLU injectivity capacity (ratio of the number of layer's inputs and outputs) is established as isomorphic to determining the capacity of the so-called $\ell_0$ spherical perceptron. Employing \emph{fully lifted random duality theory} (fl RDT) a powerful program is developed and utilized to handle the $\ell_0$ spherical perceptron and implicitly the ReLU layers injectivity. To put the entire fl RDT machinery in practical use, a sizeable set of numerical evaluations is conducted as well. The lifting mechanism is observed to converge remarkably fast with relative corrections in the estimated quantities not exceeding $\sim 0.1\%$ already on the third level of lifting. Closed form explicit analytical relations among key lifting parameters are uncovered as well. In addition to being of incredible importance in handling all the required numerical work, these relations also shed a new light on beautiful parametric interconnections within the lifting structure. Finally, the obtained results are also shown to fairly closely match the replica predictions from [40].
Related papers
- Hybrid Two-Stage Reconstruction of Multiscale Subsurface Flow with Physics-informed Residual Connected Neural Operator [4.303037819686676]
We propose a hybrid two-stage framework that uses multiscale basis functions and physics-guided deep learning to solve the Darcy flow problem.
The framework achieves R2 values above 0.9 in terms of basis function fitting and pressure reconstruction, and the residual indicator is on the order of $1times 10-4$.
arXiv Detail & Related papers (2025-01-22T23:28:03Z) - Deep ReLU networks -- injectivity capacity upper bounds [0.0]
We study deep ReLU feed forward neural networks (NNs) and their injectivity abilities.
For any given hidden layers architecture, it is defined as the minimal ratio between number of network's outputs and inputs.
A strong recent progress in precisely studying single ReLU layer injectivity properties is here moved to a deep network level.
arXiv Detail & Related papers (2024-12-27T14:57:40Z) - $α$-TCVAE: On the relationship between Disentanglement and Diversity [21.811889512977924]
In this work, we introduce $alpha$-TCVAE, a variational autoencoder optimized using a novel total correlation (TC) lower bound.
We present quantitative analyses that support the idea that disentangled representations lead to better generative capabilities and diversity.
Our results demonstrate that $alpha$-TCVAE consistently learns more disentangled representations than baselines and generates more diverse observations.
arXiv Detail & Related papers (2024-11-01T13:50:06Z) - Enabling Uncertainty Estimation in Iterative Neural Networks [49.56171792062104]
We develop an approach to uncertainty estimation that provides state-of-the-art estimates at a much lower computational cost than techniques like Ensembles.
We demonstrate its practical value by embedding it in two application domains: road detection in aerial images and the estimation of aerodynamic properties of 2D and 3D shapes.
arXiv Detail & Related papers (2024-03-25T13:06:31Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Multi-Grid Tensorized Fourier Neural Operator for High-Resolution PDEs [93.82811501035569]
We introduce a new data efficient and highly parallelizable operator learning approach with reduced memory requirement and better generalization.
MG-TFNO scales to large resolutions by leveraging local and global structures of full-scale, real-world phenomena.
We demonstrate superior performance on the turbulent Navier-Stokes equations where we achieve less than half the error with over 150x compression.
arXiv Detail & Related papers (2023-09-29T20:18:52Z) - Understanding Augmentation-based Self-Supervised Representation Learning
via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator.
This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z) - Gate-based spin readout of hole quantum dots with site-dependent
$g-$factors [101.23523361398418]
We experimentally investigate a hole double quantum dot in silicon by carrying out spin readout with gate-based reflectometry.
We show that characteristic features in the reflected phase signal arising from magneto-spectroscopy convey information on site-dependent $g-$factors in the two dots.
arXiv Detail & Related papers (2022-06-27T09:07:20Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - Physics-aware deep neural networks for surrogate modeling of turbulent
natural convection [0.0]
We investigate the use of PINNs surrogate modeling for turbulent Rayleigh-B'enard convection flows.
We show how it comes to play as a regularization close to the training boundaries which are zones of poor accuracy for standard PINNs.
The predictive accuracy of the surrogate over the entire half a billion DNS coordinates yields errors for all flow variables ranging between [0.3% -- 4%] in the relative L 2 norm.
arXiv Detail & Related papers (2021-03-05T09:48:57Z) - Sparse Representations of Positive Functions via First and Second-Order
Pseudo-Mirror Descent [15.340540198612823]
We consider expected risk problems when the range of the estimator is required to be nonnegative.
We develop first and second-order variants of approximation mirror descent employing emphpseudo-gradients.
Experiments demonstrate favorable performance on ingeneous Process intensity estimation in practice.
arXiv Detail & Related papers (2020-11-13T21:54:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.