Categorical Representation Learning: Morphism is All You Need
- URL: http://arxiv.org/abs/2103.14770v2
- Date: Tue, 30 Mar 2021 17:34:05 GMT
- Title: Categorical Representation Learning: Morphism is All You Need
- Authors: Artan Sheshmani and Yizhuang You
- Abstract summary: We provide a construction for categorical representation learning and introduce the foundations of "$textitcategorifier$"
Every object in a dataset $mathcalS$ can be represented as a vector in $mathbbRn$ by an $textitencoding map$ $E: mathcalObj(mathcalS)tomathbbRn$.
As a proof of concept, we provide an example of a text translator equipped with our technology, showing that our categorical learning model outperforms the
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide a construction for categorical representation learning and
introduce the foundations of "$\textit{categorifier}$". The central theme in
representation learning is the idea of $\textbf{everything to vector}$. Every
object in a dataset $\mathcal{S}$ can be represented as a vector in
$\mathbb{R}^n$ by an $\textit{encoding map}$ $E:
\mathcal{O}bj(\mathcal{S})\to\mathbb{R}^n$. More importantly, every morphism
can be represented as a matrix $E:
\mathcal{H}om(\mathcal{S})\to\mathbb{R}^{n}_{n}$. The encoding map $E$ is
generally modeled by a $\textit{deep neural network}$. The goal of
representation learning is to design appropriate tasks on the dataset to train
the encoding map (assuming that an encoding is optimal if it universally
optimizes the performance on various tasks). However, the latter is still a
$\textit{set-theoretic}$ approach. The goal of the current article is to
promote the representation learning to a new level via a
$\textit{category-theoretic}$ approach. As a proof of concept, we provide an
example of a text translator equipped with our technology, showing that our
categorical learning model outperforms the current deep learning models by 17
times. The content of the current article is part of the recent US patent
proposal (patent application number: 63110906).
Related papers
- Transformer In-Context Learning for Categorical Data [51.23121284812406]
We extend research on understanding Transformers through the lens of in-context learning with functional data by considering categorical outcomes, nonlinear underlying models, and nonlinear attention.
We present what is believed to be the first real-world demonstration of this few-shot-learning methodology, using the ImageNet dataset.
arXiv Detail & Related papers (2024-05-27T15:03:21Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - Uncovering hidden geometry in Transformers via disentangling position
and context [0.6118897979046375]
We present a simple yet informative decomposition of hidden states (or embeddings) of trained transformers into interpretable components.
For popular transformer architectures and diverse text datasets, empirically we find pervasive mathematical structure.
arXiv Detail & Related papers (2023-10-07T15:50:26Z) - On the Power of Multitask Representation Learning in Linear MDP [61.58929164172968]
This paper presents analyses for the statistical benefit of multitask representation learning in linear Markov Decision Process (MDP)
We first discover a emphLeast-Activated-Feature-Abundance (LAFA) criterion, denoted as $kappa$, with which we prove that a straightforward least-square algorithm learns a policy which is $tildeO(H2sqrtfrackappa mathcalC(Phi)2 kappa dNT+frackappa dn)
arXiv Detail & Related papers (2021-06-15T11:21:06Z) - Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
We consider the problem of learning a latent $k$-vertex simplex $KsubsetmathbbRdtimes n$, given access to $AinmathbbRdtimes n$.
We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $A$.
arXiv Detail & Related papers (2021-05-17T16:40:48Z) - Learners' languages [0.0]
Authors show that the fundamental elements of deep learning -- gradient descent and backpropagation -- can be conceptualized as a strong monoidal functor.
We show that a map $Ato B$ in $mathbfPara(mathbfSLens)$ has a natural interpretation in terms of dynamical systems.
arXiv Detail & Related papers (2021-03-01T18:34:00Z) - Learning a Lie Algebra from Unlabeled Data Pairs [7.329382191592538]
Deep convolutional networks (convnets) show a remarkable ability to learn disentangled representations.
This article proposes a machine learning method to discover a nonlinear transformation of the space $mathbbRn$.
The key idea is to approximate every target $boldsymboly_i$ by a matrix--vector product of the form $boldsymbolwidetildey_i = boldsymbolphi(t_i) boldsymbolx_i$.
arXiv Detail & Related papers (2020-09-19T23:23:52Z) - Few-Shot Learning via Learning the Representation, Provably [115.7367053639605]
This paper studies few-shot learning via representation learning.
One uses $T$ source tasks with $n_1$ data per task to learn a representation in order to reduce the sample complexity of a target task.
arXiv Detail & Related papers (2020-02-21T17:30:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.