Related papers: Learners' languages

Related papers

$p$-Adic Polynomial Regression as Alternative to Neural Network for Approximating $p$-Adic Functions of Many Variables [55.2480439325792]
A regression model is constructed that allows approximating continuous functions with any degree of accuracy.<n>The proposed model can be considered as a simple alternative to possible $p$-adic models based on neural network architecture.
arXiv Detail & Related papers (2025-03-30T15:42:08Z)
Learning and Computation of $Φ$-Equilibria at the Frontier of Tractability [85.07238533644636]
$Phi$-equilibria is a powerful and flexible framework at the heart of online learning and game theory. We show that an efficient online algorithm incurs average $Phi$-regret at most $epsilon$ using $textpoly(d, k)/epsilon2$ rounds. We also show nearly matching lower bounds in the online setting, thereby obtaining for the first time a family of deviations that captures the learnability of $Phi$-regret.
arXiv Detail & Related papers (2025-02-25T19:08:26Z)
The Sample Complexity of Online Reinforcement Learning: A Multi-model Perspective [55.15192437680943]
We study the sample complexity of online reinforcement learning in the general setting of nonlinear dynamical systems with continuous state and action spaces.<n>Our algorithm achieves a policy regret of $mathcalO(N epsilon2 + mathrmln(m(epsilon)/epsilon2)$, where $epsilon$ is the time horizon.<n>In the special case where the dynamics are parametrized by a compact and real-valued set of parameters, we prove a policy regret of $mathcalO(sqrt
arXiv Detail & Related papers (2025-01-27T10:01:28Z)
Learning Hierarchical Polynomials of Multiple Nonlinear Features with Three-Layer Networks [46.190882811878744]
In deep learning theory, a critical question is to understand how neural networks learn hierarchical features. In this work, we study the learning of hierarchicals of textitmultiple nonlinear features using three-layer neural networks.
arXiv Detail & Related papers (2024-11-26T08:14:48Z)
Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models. In this work, we initiate the study of provably learning a multi-head attention layer from random examples. We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z)
Families of costs with zero and nonnegative MTW tensor in optimal transport [0.0]
We compute explicitly the MTW tensor for the optimal transport problem on $mathbbRn$ with a cost function of form $mathsfc$. We analyze the $sinh$-type hyperbolic cost, providing examples of $mathsfc$-type functions and divergence.
arXiv Detail & Related papers (2024-01-01T20:33:27Z)
Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks. For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error. This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z)
Uncovering hidden geometry in Transformers via disentangling position and context [0.6118897979046375]
We present a simple yet informative decomposition of hidden states (or embeddings) of trained transformers into interpretable components. For popular transformer architectures and diverse text datasets, empirically we find pervasive mathematical structure.
arXiv Detail & Related papers (2023-10-07T15:50:26Z)
O$n$ Learning Deep O($n$)-Equivariant Hyperspheres [18.010317026027028]
We propose an approach to learning deep features equivariant under the transformations of $n$D reflections and rotations. Namely, we propose O$(n)$-equivariant neurons with spherical decision surfaces that generalize to any dimension $n$. We experimentally verify our theoretical contributions and find that our approach is superior to the competing methods for O$(n)$-equivariant benchmark datasets.
arXiv Detail & Related papers (2023-05-24T23:04:34Z)
Multi-Task Imitation Learning for Linear Dynamical Systems [50.124394757116605]
We study representation learning for efficient imitation learning over linear systems. We find that the imitation gap over trajectories generated by the learned target policy is bounded by $tildeOleft( frack n_xHN_mathrmshared + frack n_uN_mathrmtargetright)$.
arXiv Detail & Related papers (2022-12-01T00:14:35Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations. The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)
Beyond the Berry Phase: Extrinsic Geometry of Quantum States [77.34726150561087]
We show how all properties of a quantum manifold of states are fully described by a gauge-invariant Bargmann. We show how our results have immediate applications to the modern theory of polarization.
arXiv Detail & Related papers (2022-05-30T18:01:34Z)
Metric Hypertransformers are Universal Adapted Maps [4.83420384410068]
metric hypertransformers (MHTs) are capable of approxing any adapted map $F:mathscrXmathbbZrightarrow mathscrYmathbbZ$ with approximable complexity. Our results provide the first (quantimat) universal approximation theorem compatible with any such $mathscrX$ and $mathscrY$.
arXiv Detail & Related papers (2022-01-31T10:03:46Z)
Uncertainties in Quantum Measurements: A Quantum Tomography [52.77024349608834]
The observables associated with a quantum system $S$ form a non-commutative algebra $mathcal A_S$. It is assumed that a density matrix $rho$ can be determined from the expectation values of observables. Abelian algebras do not have inner automorphisms, so the measurement apparatus can determine mean values of observables.
arXiv Detail & Related papers (2021-12-14T16:29:53Z)
Fast Graph Sampling for Short Video Summarization using Gershgorin Disc Alignment [52.577757919003844]
We study the problem of efficiently summarizing a short video into several paragraphs, leveraging recent progress in fast graph sampling. Experimental results show that our algorithm achieves comparable video summarization as state-of-the-art methods, at a substantially reduced complexity.
arXiv Detail & Related papers (2021-10-21T18:43:00Z)
Categorical Representation Learning: Morphism is All You Need [0.0]
We provide a construction for categorical representation learning and introduce the foundations of "$textitcategorifier$" Every object in a dataset $mathcalS$ can be represented as a vector in $mathbbRn$ by an $textitencoding map$ $E: mathcalObj(mathcalS)tomathbbRn$. As a proof of concept, we provide an example of a text translator equipped with our technology, showing that our categorical learning model outperforms the
arXiv Detail & Related papers (2021-03-26T23:47:15Z)
For Manifold Learning, Deep Neural Networks can be Locality Sensitive Hash Functions [14.347610075713412]
We show that neural representations can be viewed as LSH-like functions that map each input to an embedding. An important consequence of this behavior is one-shot learning to unseen classes.
arXiv Detail & Related papers (2021-03-11T18:57:47Z)
Learning a Lie Algebra from Unlabeled Data Pairs [7.329382191592538]
Deep convolutional networks (convnets) show a remarkable ability to learn disentangled representations. This article proposes a machine learning method to discover a nonlinear transformation of the space $mathbbRn$. The key idea is to approximate every target $boldsymboly_i$ by a matrix--vector product of the form $boldsymbolwidetildey_i = boldsymbolphi(t_i) boldsymbolx_i$.
arXiv Detail & Related papers (2020-09-19T23:23:52Z)
A deep network construction that adapts to intrinsic dimensionality beyond the domain [79.23797234241471]
We study the approximation of two-layer compositions $f(x) = g(phi(x))$ via deep networks with ReLU activation. We focus on two intuitive and practically relevant choices for $phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets.
arXiv Detail & Related papers (2020-08-06T09:50:29Z)
A Canonical Transform for Strengthening the Local $L^p$-Type Universal Approximation Property [4.18804572788063]
$Lp$-type universal approximation theorems guarantee that a given machine learning model class $mathscrFsubseteq C(mathbbRd,mathbbRD)$ is dense in $Lp_mu(mathbbRd,mathbbRD)$. This paper proposes a generic solution to this approximation theoretic problem by introducing a canonical transformation which "upgrades $mathscrF$'s approximation property"
arXiv Detail & Related papers (2020-06-24T17:46:35Z)
Learning Polynomials of Few Relevant Dimensions [12.122488725065388]
Polynomial regression is a basic primitive in learning and statistics. We give an algorithm that learns the runtime within accuracy $epsilon$ with sample complexity that is roughly $N = O_r,d(n log2(1/epsilon) (log n)d)$ and $O_r,d(N n2)$.
arXiv Detail & Related papers (2020-04-28T18:00:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.