Related papers: RewriteNets: End-to-End Trainable String-Rewriting for Generative Sequence Modeling

RewriteNets: End-to-End Trainable String-Rewriting for Generative Sequence Modeling

URL: http://arxiv.org/abs/2601.07868v1
Date: Sat, 10 Jan 2026 19:59:37 GMT
Title: RewriteNets: End-to-End Trainable String-Rewriting for Generative Sequence Modeling
Authors: Harshil Vejendla,
Abstract summary: We propose RewriteNets, a novel neural architecture built on explicit, parallel string rewriting.<n>We evaluate RewriteNets on algorithmic, compositional, and string manipulation tasks, comparing them against strong LSTM and Transformer baselines.<n>Results show that RewriteNets excel at tasks requiring systematic generalization and are more efficient than Transformers.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dominant sequence models like the Transformer represent structure implicitly through dense attention weights, incurring quadratic complexity. We propose RewriteNets, a novel neural architecture built on an alternative paradigm: explicit, parallel string rewriting. Each layer in a RewriteNet contains a set of learnable rules. For each position in an input sequence, the layer performs four operations: (1) fuzzy matching of rule patterns, (2) conflict resolution via a differentiable assignment operator to select non-overlapping rewrites, (3) application of the chosen rules to replace input segments with output segments of potentially different lengths, and (4) propagation of untouched tokens. While the discrete assignment of rules is non-differentiable, we employ a straight-through Gumbel-Sinkhorn estimator, enabling stable end-to-end training. We evaluate RewriteNets on algorithmic, compositional, and string manipulation tasks, comparing them against strong LSTM and Transformer baselines. Results show that RewriteNets excel at tasks requiring systematic generalization (achieving 98.7% accuracy on the SCAN benchmark's length split) and are computationally more efficient than Transformers. We also provide an analysis of learned rules and an extensive ablation study, demonstrating that this architecture presents a promising direction for sequence modeling with explicit structural inductive biases.

Related papers

GEC-DePenD: Non-Autoregressive Grammatical Error Correction with Decoupled Permutation and Decoding [52.14832976759585]
Grammatical error correction (GEC) is an important NLP task that is usually solved with autoregressive sequence-to-sequence models. We propose a novel non-autoregressive approach to GEC that decouples the architecture into a permutation network. We show that the resulting network improves over previously known non-autoregressive methods for GEC.
arXiv Detail & Related papers (2023-11-14T14:24:36Z)
Do Transformers use variable binding? [14.222494511474103]
Increasing the explainability of deep neural networks (DNNs) requires evaluating whether they implement symbolic computation. One central symbolic capacity is variable binding: linking an input value to an abstract variable held in system-internal memory. We provide the first systematic evaluation of the variable binding capacities of the state-of-the-art Transformer networks BERT and RoBERTa.
arXiv Detail & Related papers (2022-02-19T09:56:38Z)
Discovering Non-monotonic Autoregressive Orderings with Variational Inference [67.27561153666211]
We develop an unsupervised parallelizable learner that discovers high-quality generation orders purely from training data. We implement the encoder as a Transformer with non-causal attention that outputs permutations in one forward pass. Empirical results in language modeling tasks demonstrate that our method is context-aware and discovers orderings that are competitive with or even better than fixed orders.
arXiv Detail & Related papers (2021-10-27T16:08:09Z)
Inducing Transformer's Compositional Generalization Ability via Auxiliary Sequence Prediction Tasks [86.10875837475783]
Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions. Existing neural models have been shown to lack this basic ability in learning symbolic structures. We propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics.
arXiv Detail & Related papers (2021-09-30T16:41:19Z)
Self-Supervised Learning to Prove Equivalence Between Straight-Line Programs via Rewrite Rules [9.1570563482476]
Two programs are equivalent if there exists a sequence of application of rewrite rules that leads to rewriting one program into the other. We propose a neural network architecture based on a transformer model to generate proofs of equivalence between program pairs. Our system, S4Eq, achieves 97% proof success on a curated dataset of 10,000 pairs of equivalent programs.
arXiv Detail & Related papers (2021-09-22T01:37:08Z)
Don't Take It Literally: An Edit-Invariant Sequence Loss for Text Generation [109.46348908829697]
We propose a novel Edit-Invariant Sequence Loss (EISL), which computes the matching loss of a target n-gram with all n-grams in the generated sequence. We conduct experiments on three tasks: machine translation with noisy target sequences, unsupervised text style transfer, and non-autoregressive machine translation.
arXiv Detail & Related papers (2021-06-29T03:59:21Z)
Structured Reordering for Modeling Latent Alignments in Sequence Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations. The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z)
Logic Constrained Pointer Networks for Interpretable Textual Similarity [11.142649867439406]
We introduce a novel pointer network based model with a sentinel gating function to align constituent chunks. We improve this base model with a loss function to equally penalize misalignments in both sentences, ensuring the alignments are bidirectional. The model achieves an F1 score of 97.73 and 96.32 on the benchmark SemEval datasets for the chunk alignment task.
arXiv Detail & Related papers (2020-07-15T13:01:44Z)
Multi-level Head-wise Match and Aggregation in Transformer for Textual Sequence Matching [87.97265483696613]
We propose a new approach to sequence pair matching with Transformer, by learning head-wise matching representations on multiple levels. Experiments show that our proposed approach can achieve new state-of-the-art performance on multiple tasks.
arXiv Detail & Related papers (2020-01-20T20:02:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.