Learners' languages
- URL: http://arxiv.org/abs/2103.01189v1
- Date: Mon, 1 Mar 2021 18:34:00 GMT
- Title: Learners' languages
- Authors: David I. Spivak
- Abstract summary: Authors show that the fundamental elements of deep learning -- gradient descent and backpropagation -- can be conceptualized as a strong monoidal functor.
We show that a map $Ato B$ in $mathbfPara(mathbfSLens)$ has a natural interpretation in terms of dynamical systems.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In "Backprop as functor", the authors show that the fundamental elements of
deep learning -- gradient descent and backpropagation -- can be conceptualized
as a strong monoidal functor $\mathbf{Para}(\mathbf{Euc})\to\mathbf{Learn}$
from the category of parameterized Euclidean spaces to that of learners, a
category developed explicitly to capture parameter update and backpropagation.
It was soon realized that there is an isomorphism
$\mathbf{Learn}\cong\mathbf{Para}(\mathbf{SLens})$, where $\mathbf{SLens}$ is
the symmetric monoidal category of simple lenses as used in functional
programming.
In this note, we observe that $\mathbf{SLens}$ is a full subcategory of
$\mathbf{Poly}$, the category of polynomial functors in one variable, via the
functor $A\mapsto Ay^A$. Using the fact that $(\mathbf{Poly},\otimes)$ is
monoidal closed, we show that a map $A\to B$ in $\mathbf{Para}(\mathbf{SLens})$
has a natural interpretation in terms of dynamical systems (more precisely,
generalized Moore machines) whose interface is the internal-hom type
$[Ay^A,By^B]$.
Finally, we review the fact that the category $p\text{-}\mathbf{Coalg}$ of
dynamical systems on any $p\in\mathbf{Poly}$ forms a topos, and consider the
logical propositions that can be stated in its internal language. We give
gradient descent as an example, and we conclude by discussing some directions
for future work.
Related papers
- Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Families of costs with zero and nonnegative MTW tensor in optimal
transport [0.0]
We compute explicitly the MTW tensor for the optimal transport problem on $mathbbRn$ with a cost function of form $mathsfc$.
We analyze the $sinh$-type hyperbolic cost, providing examples of $mathsfc$-type functions and divergence.
arXiv Detail & Related papers (2024-01-01T20:33:27Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - Uncovering hidden geometry in Transformers via disentangling position
and context [0.6118897979046375]
We present a simple yet informative decomposition of hidden states (or embeddings) of trained transformers into interpretable components.
For popular transformer architectures and diverse text datasets, empirically we find pervasive mathematical structure.
arXiv Detail & Related papers (2023-10-07T15:50:26Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Beyond the Berry Phase: Extrinsic Geometry of Quantum States [77.34726150561087]
We show how all properties of a quantum manifold of states are fully described by a gauge-invariant Bargmann.
We show how our results have immediate applications to the modern theory of polarization.
arXiv Detail & Related papers (2022-05-30T18:01:34Z) - Metric Hypertransformers are Universal Adapted Maps [4.83420384410068]
metric hypertransformers (MHTs) are capable of approxing any adapted map $F:mathscrXmathbbZrightarrow mathscrYmathbbZ$ with approximable complexity.
Our results provide the first (quantimat) universal approximation theorem compatible with any such $mathscrX$ and $mathscrY$.
arXiv Detail & Related papers (2022-01-31T10:03:46Z) - Uncertainties in Quantum Measurements: A Quantum Tomography [52.77024349608834]
The observables associated with a quantum system $S$ form a non-commutative algebra $mathcal A_S$.
It is assumed that a density matrix $rho$ can be determined from the expectation values of observables.
Abelian algebras do not have inner automorphisms, so the measurement apparatus can determine mean values of observables.
arXiv Detail & Related papers (2021-12-14T16:29:53Z) - Fast Graph Sampling for Short Video Summarization using Gershgorin Disc
Alignment [52.577757919003844]
We study the problem of efficiently summarizing a short video into several paragraphs, leveraging recent progress in fast graph sampling.
Experimental results show that our algorithm achieves comparable video summarization as state-of-the-art methods, at a substantially reduced complexity.
arXiv Detail & Related papers (2021-10-21T18:43:00Z) - Learning a Lie Algebra from Unlabeled Data Pairs [7.329382191592538]
Deep convolutional networks (convnets) show a remarkable ability to learn disentangled representations.
This article proposes a machine learning method to discover a nonlinear transformation of the space $mathbbRn$.
The key idea is to approximate every target $boldsymboly_i$ by a matrix--vector product of the form $boldsymbolwidetildey_i = boldsymbolphi(t_i) boldsymbolx_i$.
arXiv Detail & Related papers (2020-09-19T23:23:52Z) - A Canonical Transform for Strengthening the Local $L^p$-Type Universal
Approximation Property [4.18804572788063]
$Lp$-type universal approximation theorems guarantee that a given machine learning model class $mathscrFsubseteq C(mathbbRd,mathbbRD)$ is dense in $Lp_mu(mathbbRd,mathbbRD)$.
This paper proposes a generic solution to this approximation theoretic problem by introducing a canonical transformation which "upgrades $mathscrF$'s approximation property"
arXiv Detail & Related papers (2020-06-24T17:46:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.