Related papers: Measure-to-measure interpolation using Transformers

Measure-to-measure interpolation using Transformers

URL: http://arxiv.org/abs/2411.04551v1
Date: Thu, 07 Nov 2024 09:18:39 GMT
Title: Measure-to-measure interpolation using Transformers
Authors: Borjan Geshkovski, Philippe Rigollet, Domènec Ruiz-Balet,
Abstract summary: Transformers are deep neural network architectures that underpin the recent successes of large language models. A Transformer acts as a measure-to-measure map implemented as specific interacting particle system on the unit sphere. We provide an explicit choice of parameters that allows a single Transformer to match $N$ arbitrary input measures to $N$ arbitrary target measures.
Score: 6.13239149235581
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformers are deep neural network architectures that underpin the recent successes of large language models. Unlike more classical architectures that can be viewed as point-to-point maps, a Transformer acts as a measure-to-measure map implemented as specific interacting particle system on the unit sphere: the input is the empirical measure of tokens in a prompt and its evolution is governed by the continuity equation. In fact, Transformers are not limited to empirical measures and can in principle process any input measure. As the nature of data processed by Transformers is expanding rapidly, it is important to investigate their expressive power as maps from an arbitrary measure to another arbitrary measure. To that end, we provide an explicit choice of parameters that allows a single Transformer to match $N$ arbitrary input measures to $N$ arbitrary target measures, under the minimal assumption that every pair of input-target measures can be matched by some transport map.

Related papers

Constant Bit-size Transformers Are Turing Complete [8.38684825915246]
We prove that any Turing machine running on inputs of arbitrary length can be simulated by a constant bit-size transformer.<n>Our approach relies on simulating Post machines, a Turing-complete computational model.
arXiv Detail & Related papers (2025-05-22T02:45:38Z)
Algorithmic Capabilities of Random Transformers [49.73113518329544]
We investigate what functions can be learned by randomly transformers in which only the embedding layers are optimized. We find that these random transformers can perform a wide range of meaningful algorithmic tasks. Our results indicate that some algorithmic capabilities are present in transformers even before these models are trained.
arXiv Detail & Related papers (2024-10-06T06:04:23Z)
Higher-Order Transformer Derivative Estimates for Explicit Pathwise Learning Guarantees [9.305677878388664]
This paper fills a gap in the literature by precisely estimating all the higher-order derivatives of all orders for the transformer model. We obtain fully-explicit estimates of all constants in terms of the number of attention heads, the depth and width of each transformer block, and the number of normalization layers. We conclude that real-world transformers can learn from $N$ samples of a single Markov process's trajectory at a rate of $O(operatornamepolylog(N/sqrtN)$.
arXiv Detail & Related papers (2024-05-26T13:19:32Z)
MoEUT: Mixture-of-Experts Universal Transformers [75.96744719516813]
Universal Transformers (UTs) have advantages over standard Transformers in learning compositional generalizations. Layer-sharing drastically reduces the parameter count compared to the non-shared model with the same dimensionality. No previous work has succeeded in proposing a shared-layer Transformer design that is competitive in parameter count-dominated tasks such as language modeling.
arXiv Detail & Related papers (2024-05-25T03:24:32Z)
Transformer-Based Neural Surrogate for Link-Level Path Loss Prediction from Variable-Sized Maps [11.327456466796681]
Estimating path loss for a transmitter-receiver location is key to many use-cases including network planning and handover. We present a transformer-based neural network architecture that enables predicting link-level properties from maps of various dimensions and from sparse measurements.
arXiv Detail & Related papers (2023-10-06T20:17:40Z)
Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input [50.83356836818667]
We study the approximation and estimation ability of Transformers as sequence-to-sequence functions with infinite dimensional inputs. Our theoretical results support the practical success of Transformers for high dimensional data.
arXiv Detail & Related papers (2023-05-30T02:44:49Z)
Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation [58.4650849317274]
Volumetric Aggregation with Transformers (VAT) is a cost aggregation network for few-shot segmentation. VAT attains state-of-the-art performance for semantic correspondence as well, where cost aggregation also plays a central role.
arXiv Detail & Related papers (2022-07-22T04:10:30Z)
Your Transformer May Not be as Powerful as You Expect [88.11364619182773]
We mathematically analyze the power of RPE-based Transformers regarding whether the model is capable of approximating any continuous sequence-to-sequence functions. We present a negative result by showing there exist continuous sequence-to-sequence functions that RPE-based Transformers cannot approximate no matter how deep and wide the neural network is. We develop a novel attention module, called Universal RPE-based (URPE) Attention, which satisfies the conditions.
arXiv Detail & Related papers (2022-05-26T14:51:30Z)
Explainable Graph Theory-Based Identification of Meter-Transformer Mapping [6.18054021053899]
Distributed energy resources are better for the environment but may cause transformer overload in distribution grids. The challenge lies in recovering meter-transformer (M.T.) mapping for two common scenarios.
arXiv Detail & Related papers (2022-05-19T21:47:07Z)
Scalable Transformers for Neural Machine Translation [86.4530299266897]
Transformer has been widely adopted in Neural Machine Translation (NMT) because of its large capacity and parallel training of sequence generation. We propose a novel scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters. A three-stage training scheme is proposed to tackle the difficulty of training the scalable Transformers.
arXiv Detail & Related papers (2021-06-04T04:04:10Z)
Multimodality Biomedical Image Registration using Free Point Transformer Networks [0.37501702548174964]
We describe a point-set registration algorithm based on a novel free point transformer (FPT) network. FPT is constructed with a global feature extractor which accepts unordered source and target point-sets of variable size. In a multimodal registration task using prostate MR and sparsely acquired ultrasound images, FPT yields comparable or improved results.
arXiv Detail & Related papers (2020-08-05T00:13:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.