Related papers: Partial Answer of How Transformers Learn Automata

Partial Answer of How Transformers Learn Automata

URL: http://arxiv.org/abs/2504.20395v1
Date: Tue, 29 Apr 2025 03:35:40 GMT
Title: Partial Answer of How Transformers Learn Automata
Authors: Tiantian, Zhang,
Abstract summary: We introduce a novel framework for simulating finite automata using representation-theoretic semidirect products and Fourier modules, achieving more efficient Transformer-based implementations.
Score: 58.02360042538258
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce a novel framework for simulating finite automata using representation-theoretic semidirect products and Fourier modules, achieving more efficient Transformer-based implementations.

Related papers

Enhancing Latent Computation in Transformers with Latent Tokens [48.371764897314]
Augmenting large language models with auxiliary tokens has emerged as a promising strategy for enhancing model performance.<n>We introduce a lightweight method termed latent tokens; these are dummy tokens that may be non-interpretable in natural language.<n>The proposed latent tokens can be seamlessly integrated with a pre-trained Transformer, trained in a parameter-efficient manner, and applied flexibly at inference time.
arXiv Detail & Related papers (2025-05-19T02:35:53Z)
Quantization-Free Autoregressive Action Transformer [18.499864366974613]
Current transformer-based imitation learning approaches introduce discrete action representations and train an autoregressive transformer decoder on the resulting latent code.<n>We propose a quantization-free method that leverages Generative Infinite-Vocabulary Transformers (GIVT) as a direct, continuous policy parametrization for autoregressive transformers.
arXiv Detail & Related papers (2025-03-18T13:50:35Z)
Converting Transformers into DGNNs Form [3.7468283401703797]
We introduce a synthetic unitary digraph convolution based on the digraph Fourier transform.<n>The resulting model, which we term Converter, effectively converts a Transformer into a Directed Graph Neural Network form.<n>We have tested Converter on Long-Range Arena benchmark, long document classification, and DNA sequence-based taxonomy classification.
arXiv Detail & Related papers (2025-02-01T22:44:46Z)
MoEUT: Mixture-of-Experts Universal Transformers [75.96744719516813]
Universal Transformers (UTs) have advantages over standard Transformers in learning compositional generalizations. Layer-sharing drastically reduces the parameter count compared to the non-shared model with the same dimensionality. No previous work has succeeded in proposing a shared-layer Transformer design that is competitive in parameter count-dominated tasks such as language modeling.
arXiv Detail & Related papers (2024-05-25T03:24:32Z)
Simulating Weighted Automata over Sequences and Trees with Transformers [5.078561931628571]
We show that transformers can simulate weighted finite automata (WFAs), a class of models which subsumes DFAs, as well as weighted tree automata (WTA) We prove these claims formally and provide upper bounds on the sizes of the transformer models needed as a function of the number of states the target automata.
arXiv Detail & Related papers (2024-03-12T21:54:34Z)
Investigating Recurrent Transformers with Dynamic Halt [64.862738244735]
We study the inductive biases of two major approaches to augmenting Transformers with a recurrent mechanism.<n>We propose and investigate novel ways to extend and combine the methods.
arXiv Detail & Related papers (2024-02-01T19:47:31Z)
Beyond Self Attention: A Subquadratic Fourier Wavelet Transformer with Multi Modal Fusion [0.0]
We revisit the use of spectral techniques to replace the attention mechanism in Transformers.<n>We present a comprehensive and novel reformulation of this technique in next generation transformer models.
arXiv Detail & Related papers (2021-11-25T18:03:41Z)
FNetAR: Mixing Tokens with Autoregressive Fourier Transforms [0.0]
We show that FNetAR retains state-of-the-art performance (25.8 ppl) on the task of causal language modeling. The autoregressive Fourier transform could likely be used for parameter on most Transformer-based time-series prediction models.
arXiv Detail & Related papers (2021-07-22T21:24:02Z)
Scalable Transformers for Neural Machine Translation [86.4530299266897]
Transformer has been widely adopted in Neural Machine Translation (NMT) because of its large capacity and parallel training of sequence generation. We propose a novel scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters. A three-stage training scheme is proposed to tackle the difficulty of training the scalable Transformers.
arXiv Detail & Related papers (2021-06-04T04:04:10Z)
Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks. We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z)
Addressing Some Limitations of Transformers with Feedback Memory [51.94640029417114]
Transformers have been successfully applied to sequential, auto-regressive tasks despite being feedforward networks. We propose the Feedback Transformer architecture that exposes all previous representations to all future representations. We demonstrate on a variety of benchmarks in language modeling, machine translation, and reinforcement learning that the increased representation capacity can create small, shallow models with much stronger performance than comparable Transformers.
arXiv Detail & Related papers (2020-02-21T16:37:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.