Related papers: Categorical Representation Learning: Morphism is All You Need

Categorical Representation Learning: Morphism is All You Need

URL: http://arxiv.org/abs/2103.14770v2
Date: Tue, 30 Mar 2021 17:34:05 GMT
Title: Categorical Representation Learning: Morphism is All You Need
Authors: Artan Sheshmani and Yizhuang You
Abstract summary: We provide a construction for categorical representation learning and introduce the foundations of "$textitcategorifier$" Every object in a dataset $mathcalS$ can be represented as a vector in $mathbbRn$ by an $textitencoding map$ $E: mathcalObj(mathcalS)tomathbbRn$. As a proof of concept, we provide an example of a text translator equipped with our technology, showing that our categorical learning model outperforms the
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We provide a construction for categorical representation learning and introduce the foundations of "$\textit{categorifier}$". The central theme in representation learning is the idea of $\textbf{everything to vector}$. Every object in a dataset $\mathcal{S}$ can be represented as a vector in $\mathbb{R}^n$ by an $\textit{encoding map}$ $E: \mathcal{O}bj(\mathcal{S})\to\mathbb{R}^n$. More importantly, every morphism can be represented as a matrix $E: \mathcal{H}om(\mathcal{S})\to\mathbb{R}^{n}_{n}$. The encoding map $E$ is generally modeled by a $\textit{deep neural network}$. The goal of representation learning is to design appropriate tasks on the dataset to train the encoding map (assuming that an encoding is optimal if it universally optimizes the performance on various tasks). However, the latter is still a $\textit{set-theoretic}$ approach. The goal of the current article is to promote the representation learning to a new level via a $\textit{category-theoretic}$ approach. As a proof of concept, we provide an example of a text translator equipped with our technology, showing that our categorical learning model outperforms the current deep learning models by 17 times. The content of the current article is part of the recent US patent proposal (patent application number: 63110906).

Related papers

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z)
Transformer In-Context Learning for Categorical Data [51.23121284812406]
We extend research on understanding Transformers through the lens of in-context learning with functional data by considering categorical outcomes, nonlinear underlying models, and nonlinear attention. We present what is believed to be the first real-world demonstration of this few-shot-learning methodology, using the ImageNet dataset.
arXiv Detail & Related papers (2024-05-27T15:03:21Z)
Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models. In this work, we initiate the study of provably learning a multi-head attention layer from random examples. We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z)
Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks. For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error. This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z)
Uncovering hidden geometry in Transformers via disentangling position and context [0.6118897979046375]
We present a simple yet informative decomposition of hidden states (or embeddings) of trained transformers into interpretable components. For popular transformer architectures and diverse text datasets, empirically we find pervasive mathematical structure.
arXiv Detail & Related papers (2023-10-07T15:50:26Z)
On the Power of Multitask Representation Learning in Linear MDP [61.58929164172968]
This paper presents analyses for the statistical benefit of multitask representation learning in linear Markov Decision Process (MDP) We first discover a emphLeast-Activated-Feature-Abundance (LAFA) criterion, denoted as $kappa$, with which we prove that a straightforward least-square algorithm learns a policy which is $tildeO(H2sqrtfrackappa mathcalC(Phi)2 kappa dNT+frackappa dn)
arXiv Detail & Related papers (2021-06-15T11:21:06Z)
Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
We consider the problem of learning a latent $k$-vertex simplex $KsubsetmathbbRdtimes n$, given access to $AinmathbbRdtimes n$. We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $A$.
arXiv Detail & Related papers (2021-05-17T16:40:48Z)
Learners' languages [0.0]
Authors show that the fundamental elements of deep learning -- gradient descent and backpropagation -- can be conceptualized as a strong monoidal functor. We show that a map $Ato B$ in $mathbfPara(mathbfSLens)$ has a natural interpretation in terms of dynamical systems.
arXiv Detail & Related papers (2021-03-01T18:34:00Z)
Learning a Lie Algebra from Unlabeled Data Pairs [7.329382191592538]
Deep convolutional networks (convnets) show a remarkable ability to learn disentangled representations. This article proposes a machine learning method to discover a nonlinear transformation of the space $mathbbRn$. The key idea is to approximate every target $boldsymboly_i$ by a matrix--vector product of the form $boldsymbolwidetildey_i = boldsymbolphi(t_i) boldsymbolx_i$.
arXiv Detail & Related papers (2020-09-19T23:23:52Z)
Few-Shot Learning via Learning the Representation, Provably [115.7367053639605]
This paper studies few-shot learning via representation learning. One uses $T$ source tasks with $n_1$ data per task to learn a representation in order to reduce the sample complexity of a target task.
arXiv Detail & Related papers (2020-02-21T17:30:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.