Related papers: Finding Clustering Algorithms in the Transformer Architecture

Finding Clustering Algorithms in the Transformer Architecture

URL: http://arxiv.org/abs/2506.19125v1
Date: Mon, 23 Jun 2025 20:52:01 GMT
Title: Finding Clustering Algorithms in the Transformer Architecture
Authors: Kenneth L. Clarkson, Lior Horesh, Takuya Ito, Charlotte Park, Parikshit Ram,
Abstract summary: We show that transformers can implement a fundamental and widely used algorithm for $k$-means clustering: Lloyd's algorithm.<n>We numerically implement this transformer and demonstrate in experiments the exact correspondence between our architecture and Lloyd's algorithm.<n>Our findings offer a clear and interpretable perspective on implementing precise algorithms in transformers.
Score: 16.336124248778496
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The invention of the transformer architecture has revolutionized Artificial Intelligence (AI), yielding unprecedented success in areas such as natural language processing, computer vision, and multimodal reasoning. Despite these advances, it is unclear whether transformers are able to learn and implement precise algorithms. Here, we demonstrate that transformers can exactly implement a fundamental and widely used algorithm for $k$-means clustering: Lloyd's algorithm. First, we theoretically prove the existence of such a transformer architecture, which we term the $k$-means transformer, that exactly implements Lloyd's algorithm for $k$-means clustering using the standard ingredients of modern transformers: attention and residual connections. Next, we numerically implement this transformer and demonstrate in experiments the exact correspondence between our architecture and Lloyd's algorithm, providing a fully neural implementation of $k$-means clustering. Finally, we demonstrate that interpretable alterations (e.g., incorporating layer normalizations or multilayer perceptrons) to this architecture yields diverse and novel variants of clustering algorithms, such as soft $k$-means, spherical $k$-means, trimmed $k$-means, and more. Collectively, our findings demonstrate how transformer mechanisms can precisely map onto algorithmic procedures, offering a clear and interpretable perspective on implementing precise algorithms in transformers.

Related papers

Weights to Code: Extracting Interpretable Algorithms from the Discrete Transformer [65.38883376379812]
We propose the Discrete Transformer, an architecture engineered to bridge the gap between continuous representations and discrete symbolic logic.<n> Empirically, the Discrete Transformer not only achieves performance comparable to RNN-based baselines but crucially extends interpretability to continuous variable domains.
arXiv Detail & Related papers (2026-01-09T12:49:41Z)
Clebsch-Gordan Transformer: Fast and Global Equivariant Attention [19.720202550140325]
We propose Clebsch-Gordan Transformer, achieving efficient global attention by a novel Clebsch-Gordon Convolution on $SO(3)$ irreducible representations.<n>Our method enables equivariant modeling of features at all orders while achieving $O(N log N)$ input token complexity.<n>We benchmark our method on a diverse set of benchmarks including n-body simulation, QM9, ModelNet point cloud classification and a robotic grasping dataset.
arXiv Detail & Related papers (2025-09-28T22:09:36Z)
On the Existence of Universal Simulators of Attention [17.01811978811789]
We present solutions to identically replicate attention outputs and the underlying elementary matrix and activation operations via RASP.<n>Our proofs, for the first time, show the existence of an algorithmically achievable data-agnostic solution, previously known to be approximated only by learning.
arXiv Detail & Related papers (2025-06-23T15:15:25Z)
Can Transformers Learn $n$-gram Language Models? [77.35809823602307]
We study transformers' ability to learn random $n$-gram LMs of two kinds. We find that classic estimation techniques for $n$-gram LMs such as add-$lambda$ smoothing outperform transformers.
arXiv Detail & Related papers (2024-10-03T21:21:02Z)
Transformers meet Neural Algorithmic Reasoners [16.5785372289558]
We propose a novel approach that combines the Transformer's language understanding with the robustness of graph neural network (GNN)-based neural algorithmic reasoners (NARs) We evaluate our resulting TransNAR model on CLRS-Text, the text-based version of the CLRS-30 benchmark, and demonstrate significant gains over Transformer-only models for algorithmic reasoning.
arXiv Detail & Related papers (2024-06-13T16:42:06Z)
Mechanistic Interpretability of Binary and Ternary Transformers [1.3715396507106912]
We investigate whether binary and ternary transformer networks learn distinctly different or similar algorithms when compared to full-precision transformer networks. This provides evidence against the possibility of using binary and ternary networks as a more interpretable alternative in the Large Language Models setting.
arXiv Detail & Related papers (2024-05-27T23:22:23Z)
AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures [80.28359222380733]
We design a novel transformer framework, dubbed AlgoFormer, to empower transformers with algorithmic capabilities.<n>In particular, inspired by the structure of human-designed learning algorithms, our transformer framework consists of a pre-transformer that is responsible for task preprocessing.<n>Some theoretical and empirical results are presented to show that the designed transformer has the potential to perform algorithm representation and learning.
arXiv Detail & Related papers (2024-02-21T07:07:54Z)
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL. We show that transformers can implement a broad class of standard machine learning algorithms in context. A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z)
Transformers Learn Shortcuts to Automata [52.015990420075944]
We find that a low-depth Transformer can represent the computations of any finite-state automaton. We show that a Transformer with $O(log T)$ layers can exactly replicate the computation of an automaton on an input sequence of length $T$. We further investigate the brittleness of these solutions and propose potential mitigations.
arXiv Detail & Related papers (2022-10-19T17:45:48Z)
A Logic for Expressing Log-Precision Transformers [35.25166532364007]
We show that any log-precision transformer can be equivalently expressed as a first-order logic sentence. This is the tightest known upper bound and first logical characterization of log-precision transformers.
arXiv Detail & Related papers (2022-10-06T04:18:09Z)
CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning. The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery. The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z)
Thinking Like Transformers [64.96770952820691]
We propose a computational model for the transformer-encoder in the form of a programming language. We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer. We provide RASP programs for histograms, sorting, and Dyck-languages.
arXiv Detail & Related papers (2021-06-13T13:04:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.