U(1) Symmetry-breaking Observed in Generic CNN Bottleneck Layers
- URL: http://arxiv.org/abs/2206.02220v1
- Date: Sun, 5 Jun 2022 16:54:04 GMT
- Title: U(1) Symmetry-breaking Observed in Generic CNN Bottleneck Layers
- Authors: Louis-Fran\c{c}ois Bouchard, Mohsen Ben Lazreg and Matthew Toews
- Abstract summary: We report on a significant discovery linking deep convolutional neural networks (CNN) to biological vision and fundamental particle physics.
A model of information propagation in a CNN is proposed via an analogy to an optical system.
- Score: 2.1829116024916844
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We report on a significant discovery linking deep convolutional neural
networks (CNN) to biological vision and fundamental particle physics. A model
of information propagation in a CNN is proposed via an analogy to an optical
system, where bosonic particles (i.e. photons) are concentrated as the 2D
spatial resolution of the image collapses to a focal point $1\times 1=1$. A 3D
space $(x,y,t)$ is defined by $(x,y)$ coordinates in the image plane and CNN
layer $t$, where a principal ray $(0,0,t)$ runs in the direction of information
propagation through both the optical axis and the image center pixel located at
$(x,y)=(0,0)$, about which the sharpest possible spatial focus is limited to a
circle of confusion in the image plane. Our novel insight is to model the
principal optical ray $(0,0,t)$ as geometrically equivalent to the medial
vector in the positive orthant $I(x,y) \in R^{N+}$ of a $N$-channel activation
space, e.g. along the greyscale (or luminance) vector $(t,t,t)$ in $RGB$ colour
space. Information is thus concentrated into an energy potential
$E(x,y,t)=\|I(x,y,t)\|^2$, which, particularly for bottleneck layers $t$ of
generic CNNs, is highly concentrated and symmetric about the spatial origin
$(0,0,t)$ and exhibits the well-known "Sombrero" potential of the boson
particle. This symmetry is broken in classification, where bottleneck layers of
generic pre-trained CNN models exhibit a consistent class-specific bias towards
an angle $\theta \in U(1)$ defined simultaneously in the image plane and in
activation feature space. Initial observations validate our hypothesis from
generic pre-trained CNN activation maps and a bare-bones memory-based
classification scheme, with no training or tuning. Training from scratch using
a random $U(1)$ class label the leads to improved classification in all cases.
Related papers
- Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable.
We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z) - Self-Directed Linear Classification [50.659479930171585]
In online classification, a learner aims to predict their labels in an online fashion so as to minimize the total number of mistakes.
Here we study the power of choosing the prediction order and establish the first strong separation between worst-order and random-order learning.
arXiv Detail & Related papers (2023-08-06T15:38:44Z) - Hierarchical Inference of the Lensing Convergence from Photometric
Catalogs with Bayesian Graph Neural Networks [0.0]
We introduce fluctuations on galaxy-galaxy lensing scales of $sim$1$''$ and extract random sightlines to train our BGNN.
For each test set of 1,000 sightlines, the BGNN infers the individual $kappa$ posteriors, which we combine in a hierarchical Bayesian model.
For a test field well sampled by the training set, the BGNN recovers the population mean of $kappa$ precisely and without bias.
arXiv Detail & Related papers (2022-11-15T00:29:20Z) - Overparametrized linear dimensionality reductions: From projection
pursuit to two-layer neural networks [10.368585938419619]
Given a cloud of $n$ data points in $mathbbRd$, consider all projections onto $m$-dimensional subspaces of $mathbbRd$.
What does this collection of probability distributions look like when $n,d$ grow large?
Denoting by $mathscrF_m, alpha$ the set of probability distributions in $mathbbRm$ that arise as low-dimensional projections in this limit, we establish new inner and outer bounds on $mathscrF_
arXiv Detail & Related papers (2022-06-14T00:07:33Z) - High-dimensional Asymptotics of Feature Learning: How One Gradient Step
Improves the Representation [89.21686761957383]
We study the first gradient descent step on the first-layer parameters $boldsymbolW$ in a two-layer network.
Our results demonstrate that even one step can lead to a considerable advantage over random features.
arXiv Detail & Related papers (2022-05-03T12:09:59Z) - A singular Riemannian geometry approach to Deep Neural Networks II.
Reconstruction of 1-D equivalence classes [78.120734120667]
We build the preimage of a point in the output manifold in the input space.
We focus for simplicity on the case of neural networks maps from n-dimensional real spaces to (n - 1)-dimensional real spaces.
arXiv Detail & Related papers (2021-12-17T11:47:45Z) - Exploring the Common Principal Subspace of Deep Features in Neural
Networks [50.37178960258464]
We find that different Deep Neural Networks (DNNs) trained with the same dataset share a common principal subspace in latent spaces.
Specifically, we design a new metric $mathcalP$-vector to represent the principal subspace of deep features learned in a DNN.
Small angles (with cosine close to $1.0$) have been found in the comparisons between any two DNNs trained with different algorithms/architectures.
arXiv Detail & Related papers (2021-10-06T15:48:32Z) - OSLO: On-the-Sphere Learning for Omnidirectional images and its
application to 360-degree image compression [59.58879331876508]
We study the learning of representation models for omnidirectional images and propose to use the properties of HEALPix uniform sampling of the sphere to redefine the mathematical tools used in deep learning models for omnidirectional images.
Our proposed on-the-sphere solution leads to a better compression gain that can save 13.7% of the bit rate compared to similar learned models applied to equirectangular images.
arXiv Detail & Related papers (2021-07-19T22:14:30Z) - Fundamental tradeoffs between memorization and robustness in random
features and neural tangent regimes [15.76663241036412]
We prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded.
Experiments reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.
arXiv Detail & Related papers (2021-06-04T17:52:50Z) - On the emergence of tetrahedral symmetry in the final and penultimate
layers of neural network classifiers [9.975163460952045]
We show that even the final output of the classifier $h$ is not uniform over data samples from a class $C_i$ if $h$ is a shallow network.
We explain this observation analytically in toy models for highly expressive deep neural networks.
arXiv Detail & Related papers (2020-12-10T02:32:52Z) - DeepMerge: Classifying High-redshift Merging Galaxies with Deep Neural
Networks [0.0]
We show the use of convolutional neural networks (CNNs) for the task of distinguishing between merging and non-merging galaxies in simulated images.
We extract images of merging and non-merging galaxies from the Illustris-1 cosmological simulation and apply observational and experimental noise.
The test set classification accuracy of the CNN is $79%$ for pristine and $76%$ for noisy.
arXiv Detail & Related papers (2020-04-24T20:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.