Names Don't Matter: Symbol-Invariant Transformer for Open-Vocabulary Learning
- URL: http://arxiv.org/abs/2601.23169v1
- Date: Fri, 30 Jan 2026 16:53:01 GMT
- Title: Names Don't Matter: Symbol-Invariant Transformer for Open-Vocabulary Learning
- Authors: İlker Işık, Wenchao Li,
- Abstract summary: Current neural architectures lack a principled way to handle interchangeable tokens.<n>We propose a novel Transformer-based mechanism that is provably invariant to the renaming of interchangeable tokens.
- Score: 4.288959596387606
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current neural architectures lack a principled way to handle interchangeable tokens, i.e., symbols that are semantically equivalent yet distinguishable, such as bound variables. As a result, models trained on fixed vocabularies often struggle to generalize to unseen symbols, even when the underlying semantics remain unchanged. We propose a novel Transformer-based mechanism that is provably invariant to the renaming of interchangeable tokens. Our approach employs parallel embedding streams to isolate the contribution of each interchangeable token in the input, combined with an aggregated attention mechanism that enables structured information sharing across streams. Experimental results confirm the theoretical guarantees of our method and demonstrate substantial performance gains on open-vocabulary tasks that require generalization to novel symbols.
Related papers
- $\boldsymbolλ$-Orthogonality Regularization for Compatible Representation Learning [48.264642951728085]
Retrieval systems rely on representations learned by increasingly powerful models.<n>Due to the high training cost and inconsistencies in learned representations, there is significant interest in facilitating communication between representations.
arXiv Detail & Related papers (2025-09-20T12:35:07Z) - Interchangeable Token Embeddings for Extendable Vocabulary and Alpha-Equivalence [6.991281327290525]
Language models lack the notion of interchangeable tokens.<n>We formalize this machine learning problem and introduce alpha-covariance.<n>Our findings establish a foundation for designing language models that can learn interchangeable token representations.
arXiv Detail & Related papers (2024-10-22T16:34:36Z) - Latent Space Translation via Semantic Alignment [29.2401314068038]
We show how representations learned from different neural modules can be translated between different pre-trained networks.
Our method directly estimates a transformation between two given latent spaces, thereby enabling effective stitching of encoders and decoders without additional training.
Notably, we show how it is possible to zero-shot stitch text encoders and vision decoders, or vice-versa, yielding surprisingly good classification performance in this multimodal setting.
arXiv Detail & Related papers (2023-11-01T17:12:00Z) - Self-Supervised Learning for Group Equivariant Neural Networks [75.62232699377877]
Group equivariant neural networks are the models whose structure is restricted to commute with the transformations on the input.
We propose two concepts for self-supervised tasks: equivariant pretext labels and invariant contrastive loss.
Experiments on standard image recognition benchmarks demonstrate that the equivariant neural networks exploit the proposed self-supervised tasks.
arXiv Detail & Related papers (2023-03-08T08:11:26Z) - Equivariance with Learned Canonicalization Functions [77.32483958400282]
We show that learning a small neural network to perform canonicalization is better than using predefineds.
Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks.
arXiv Detail & Related papers (2022-11-11T21:58:15Z) - Token-Label Alignment for Vision Transformers [93.58540411138164]
Data mixing strategies (e.g., CutMix) have shown the ability to greatly improve the performance of convolutional neural networks (CNNs)
We identify a token fluctuation phenomenon that has suppressed the potential of data mixing strategies.
We propose a token-label alignment (TL-Align) method to trace the correspondence between transformed tokens and the original tokens to maintain a label for each token.
arXiv Detail & Related papers (2022-10-12T17:54:32Z) - Do Transformers use variable binding? [14.222494511474103]
Increasing the explainability of deep neural networks (DNNs) requires evaluating whether they implement symbolic computation.
One central symbolic capacity is variable binding: linking an input value to an abstract variable held in system-internal memory.
We provide the first systematic evaluation of the variable binding capacities of the state-of-the-art Transformer networks BERT and RoBERTa.
arXiv Detail & Related papers (2022-02-19T09:56:38Z) - AAformer: Auto-Aligned Transformer for Person Re-Identification [82.45385078624301]
We introduce an alignment scheme in transformer architecture for the first time.
We propose the auto-aligned transformer (AAformer) to automatically locate both the human parts and nonhuman ones at patch level.
AAformer integrates the part alignment into the self-attention and the output [PART]s can be directly used as part features for retrieval.
arXiv Detail & Related papers (2021-04-02T08:00:25Z) - Unsupervised Distillation of Syntactic Information from Contextualized
Word Representations [62.230491683411536]
We tackle the task of unsupervised disentanglement between semantics and structure in neural language representations.
To this end, we automatically generate groups of sentences which are structurally similar but semantically different.
We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics.
arXiv Detail & Related papers (2020-10-11T15:13:18Z) - Meta-Learning Symmetries by Reparameterization [63.85144439337671]
We present a method for learning and encoding equivariances into networks by learning corresponding parameter sharing patterns from data.
Our experiments suggest that it can automatically learn to encode equivariances to common transformations used in image processing tasks.
arXiv Detail & Related papers (2020-07-06T17:59:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.