Learning a Lie Algebra from Unlabeled Data Pairs
- URL: http://arxiv.org/abs/2009.09321v3
- Date: Thu, 12 Nov 2020 09:29:36 GMT
- Title: Learning a Lie Algebra from Unlabeled Data Pairs
- Authors: Christopher Ick and Vincent Lostanlen
- Abstract summary: Deep convolutional networks (convnets) show a remarkable ability to learn disentangled representations.
This article proposes a machine learning method to discover a nonlinear transformation of the space $mathbbRn$.
The key idea is to approximate every target $boldsymboly_i$ by a matrix--vector product of the form $boldsymbolwidetildey_i = boldsymbolphi(t_i) boldsymbolx_i$.
- Score: 7.329382191592538
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep convolutional networks (convnets) show a remarkable ability to learn
disentangled representations. In recent years, the generalization of deep
learning to Lie groups beyond rigid motion in $\mathbb{R}^n$ has allowed to
build convnets over datasets with non-trivial symmetries, such as patterns over
the surface of a sphere. However, one limitation of this approach is the need
to explicitly define the Lie group underlying the desired invariance property
before training the convnet. Whereas rotations on the sphere have a well-known
symmetry group ($\mathrm{SO}(3)$), the same cannot be said of many real-world
factors of variability. For example, the disentanglement of pitch, intensity
dynamics, and playing technique remains a challenging task in music information
retrieval.
This article proposes a machine learning method to discover a nonlinear
transformation of the space $\mathbb{R}^n$ which maps a collection of
$n$-dimensional vectors $(\boldsymbol{x}_i)_i$ onto a collection of target
vectors $(\boldsymbol{y}_i)_i$. The key idea is to approximate every target
$\boldsymbol{y}_i$ by a matrix--vector product of the form
$\boldsymbol{\widetilde{y}}_i = \boldsymbol{\phi}(t_i) \boldsymbol{x}_i$, where
the matrix $\boldsymbol{\phi}(t_i)$ belongs to a one-parameter subgroup of
$\mathrm{GL}_n (\mathbb{R})$. Crucially, the value of the parameter $t_i \in
\mathbb{R}$ may change between data pairs $(\boldsymbol{x}_i,
\boldsymbol{y}_i)$ and does not need to be known in advance.
Related papers
- Conditional regression for the Nonlinear Single-Variable Model [4.565636963872865]
We consider a model $F(X):=f(Pi_gamma):mathbbRdto[0,rmlen_gamma]$ where $Pi_gamma: [0,rmlen_gamma]tomathbbRd$ and $f:[0,rmlen_gamma]tomathbbR1$.
We propose a nonparametric estimator, based on conditional regression, and show that it can achieve the $one$-dimensional optimal min-max rate
arXiv Detail & Related papers (2024-11-14T18:53:51Z) - Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ under isotropic Gaussian data.
We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary link function with a sample and runtime complexity of $n asymp T asymp C(q) cdot d
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Families of costs with zero and nonnegative MTW tensor in optimal
transport [0.0]
We compute explicitly the MTW tensor for the optimal transport problem on $mathbbRn$ with a cost function of form $mathsfc$.
We analyze the $sinh$-type hyperbolic cost, providing examples of $mathsfc$-type functions and divergence.
arXiv Detail & Related papers (2024-01-01T20:33:27Z) - Uncovering hidden geometry in Transformers via disentangling position
and context [0.6118897979046375]
We present a simple yet informative decomposition of hidden states (or embeddings) of trained transformers into interpretable components.
For popular transformer architectures and diverse text datasets, empirically we find pervasive mathematical structure.
arXiv Detail & Related papers (2023-10-07T15:50:26Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - Self-training Converts Weak Learners to Strong Learners in Mixture
Models [86.7137362125503]
We show that a pseudolabeler $boldsymbolbeta_mathrmpl$ can achieve classification error at most $C_mathrmerr$.
We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler $boldsymbolbeta_mathrmpl$ with classification error $C_mathrmerr$ using only $O(d)$ labeled examples.
arXiv Detail & Related papers (2021-06-25T17:59:16Z) - Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
We consider the problem of learning a latent $k$-vertex simplex $KsubsetmathbbRdtimes n$, given access to $AinmathbbRdtimes n$.
We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $A$.
arXiv Detail & Related papers (2021-05-17T16:40:48Z) - A Canonical Transform for Strengthening the Local $L^p$-Type Universal
Approximation Property [4.18804572788063]
$Lp$-type universal approximation theorems guarantee that a given machine learning model class $mathscrFsubseteq C(mathbbRd,mathbbRD)$ is dense in $Lp_mu(mathbbRd,mathbbRD)$.
This paper proposes a generic solution to this approximation theoretic problem by introducing a canonical transformation which "upgrades $mathscrF$'s approximation property"
arXiv Detail & Related papers (2020-06-24T17:46:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.