Related papers: Weights to Code: Extracting Interpretable Algorithms from the Discrete Transformer

Weights to Code: Extracting Interpretable Algorithms from the Discrete Transformer

URL: http://arxiv.org/abs/2601.05770v1
Date: Fri, 09 Jan 2026 12:49:41 GMT
Title: Weights to Code: Extracting Interpretable Algorithms from the Discrete Transformer
Authors: Yifan Zhang, Wei Bi, Kechi Zhang, Dongming Jin, Jie Fu, Zhi Jin,
Abstract summary: We propose the Discrete Transformer, an architecture engineered to bridge the gap between continuous representations and discrete symbolic logic.<n> Empirically, the Discrete Transformer not only achieves performance comparable to RNN-based baselines but crucially extends interpretability to continuous variable domains.
Score: 65.38883376379812
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Algorithm extraction aims to synthesize executable programs directly from models trained on specific algorithmic tasks, enabling de novo algorithm discovery without relying on human-written code. However, extending this paradigm to Transformer is hindered by superposition, where entangled features encoded in overlapping directions obstruct the extraction of symbolic expressions. In this work, we propose the Discrete Transformer, an architecture explicitly engineered to bridge the gap between continuous representations and discrete symbolic logic. By enforcing a strict functional disentanglement, which constrains Numerical Attention to information routing and Numerical MLP to element-wise arithmetic, and employing temperature-annealed sampling, our method effectively facilitates the extraction of human-readable programs. Empirically, the Discrete Transformer not only achieves performance comparable to RNN-based baselines but crucially extends interpretability to continuous variable domains. Moreover, our analysis of the annealing process shows that the efficient discrete search undergoes a clear phase transition from exploration to exploitation. We further demonstrate that our method enables fine-grained control over synthesized programs by imposing inductive biases. Collectively, these findings establish the Discrete Transformer as a robust framework for demonstration-free algorithm discovery, offering a rigorous pathway toward Transformer interpretability.

Related papers

From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers [67.02076505996284]
We study how the choice of pretraining data distribution steers a shallow transformer toward one behavior or the other.<n>Our results shed light on the algorithmic biases of pretrained transformers and offer conceptual guidelines for data-driven control of their learned behaviors.
arXiv Detail & Related papers (2025-12-21T08:10:26Z)
Unitary Synthesis with AlphaZero via Dynamic Circuits [0.0]
Unitary synthesis is the process of decomposing a target unitary transformation into a sequence of quantum gates.<n>We propose an approach using an AlphaZero-inspired reinforcement-learning agent for the exact compilation of unitaries.<n>The approach achieves low inference time and proves versatile across different gate sets, and qubit connectivities.
arXiv Detail & Related papers (2025-08-28T21:15:28Z)
On the Existence of Universal Simulators of Attention [17.01811978811789]
We present solutions to identically replicate attention outputs and the underlying elementary matrix and activation operations via RASP.<n>Our proofs, for the first time, show the existence of an algorithmically achievable data-agnostic solution, previously known to be approximated only by learning.
arXiv Detail & Related papers (2025-06-23T15:15:25Z)
A Smooth Transition Between Induction and Deduction: Fast Abductive Learning Based on Probabilistic Symbol Perception [81.30687085692576]
We introduce an optimization algorithm named as Probabilistic Symbol Perception (PSP), which makes a smooth transition between induction and deduction.<n> Experiments demonstrate the promising results.
arXiv Detail & Related papers (2025-02-18T14:59:54Z)
Learning signals defined on graphs with optimal transport and Gaussian process regression [1.1062090350704616]
In computational physics, machine learning has emerged as a powerful complementary tool to explore efficiently candidate designs in engineering studies.<n>We propose an innovative strategy for Gaussian process regression where inputs are large and sparse graphs with continuous node attributes and outputs are signals defined on the nodes of the associated inputs.<n>In addition to enabling signal prediction, the main point of our proposal is to come with confidence intervals on node values, which is crucial for uncertainty and active learning.
arXiv Detail & Related papers (2024-10-21T07:39:44Z)
Towards Interpretable Sequence Continuation: Analyzing Shared Circuits in Large Language Models [9.56229382432426]
This research aims to reverse engineer transformer models into human-readable representations that implement algorithmic functions. By applying circuit interpretability analysis, we identify a key sub-circuit in both GPT-2 Small and Llama-2-7B. We show that this sub-circuit has effects on various math-related prompts, such as on intervaled circuits, Spanish number word and months continuation, and natural language word problems.
arXiv Detail & Related papers (2023-11-07T16:58:51Z)
Uncovering mesa-optimization algorithms in Transformers [61.06055590704677]
Some autoregressive models can learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so. We show that standard next-token prediction error minimization gives rise to a subsidiary learning algorithm that adjusts the model as new inputs are revealed. Our findings explain in-context learning as a product of autoregressive loss minimization and inform the design of new optimization-based Transformer layers.
arXiv Detail & Related papers (2023-09-11T22:42:50Z)
Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL [154.13105285663656]
A cooperative Multi-A gent R einforcement Learning (MARL) with permutation invariant agents framework has achieved tremendous empirical successes in real-world applications. Unfortunately, the theoretical understanding of this MARL problem is lacking due to the curse of many agents and the limited exploration of the relational reasoning in existing works. We prove that the suboptimality gaps of the model-free and model-based algorithms are independent of and logarithmic in the number of agents respectively, which mitigates the curse of many agents.
arXiv Detail & Related papers (2022-09-20T16:42:59Z)
Rethinking Reinforcement Learning based Logic Synthesis [13.18408482571087]
We develop a new RL-based method that can automatically recognize critical operators and generate common operator sequences generalizable to unseen circuits. Our algorithm is verified on both the EPFL benchmark, a private dataset and a circuit at industrial scale.
arXiv Detail & Related papers (2022-05-16T12:15:32Z)
Gradient-Based Learning of Discrete Structured Measurement Operators for Signal Recovery [16.740247586153085]
We show how to leverage gradient-based learning to solve discrete optimization problems. Our approach is formalized by GLODISMO (Gradient-based Learning of DIscrete Structured Measurement Operators) We empirically demonstrate the performance and flexibility of GLODISMO in several signal recovery applications.
arXiv Detail & Related papers (2022-02-07T18:27:08Z)
Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation [72.40827239394565]
We propose to compute features only at sparsely sampled locations. We then densely reconstruct the feature map with an efficient procedure. The presented network is experimentally shown to save substantial computation while maintaining accuracy over a variety of computer vision tasks.
arXiv Detail & Related papers (2020-03-19T15:36:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.