Explaining the Explainer: Understanding the Inner Workings of Transformer-based Symbolic Regression Models
- URL: http://arxiv.org/abs/2602.03506v1
- Date: Tue, 03 Feb 2026 13:27:10 GMT
- Title: Explaining the Explainer: Understanding the Inner Workings of Transformer-based Symbolic Regression Models
- Authors: Arco van Breda, Erman Acar,
- Abstract summary: We introduce PATCHES, an evolutionary circuit discovery algorithm that identifies compact and correct circuits for symbolic regression.<n>Using PATCHES, we isolate 28 circuits, providing the first circuit-level characterisation of an SR transformer.
- Score: 3.7957452405531265
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Following their success across many domains, transformers have also proven effective for symbolic regression (SR); however, the internal mechanisms underlying their generation of mathematical operators remain largely unexplored. Although mechanistic interpretability has successfully identified circuits in language and vision models, it has not yet been applied to SR. In this article, we introduce PATCHES, an evolutionary circuit discovery algorithm that identifies compact and correct circuits for SR. Using PATCHES, we isolate 28 circuits, providing the first circuit-level characterisation of an SR transformer. We validate these findings through a robust causal evaluation framework based on key notions such as faithfulness, completeness, and minimality. Our analysis shows that mean patching with performance-based evaluation most reliably isolates functionally correct circuits. In contrast, we demonstrate that direct logit attribution and probing classifiers primarily capture correlational features rather than causal ones, limiting their utility for circuit discovery. Overall, these results establish SR as a high-potential application domain for mechanistic interpretability and propose a principled methodology for circuit discovery.
Related papers
- Certified Circuits: Stability Guarantees for Mechanistic Circuits [80.30622018787835]
Certified Circuits provides provable stability guarantees for circuit discovery.<n>On ImageNet and OOD datasets, certified circuits achieve up to 91% higher accuracy.
arXiv Detail & Related papers (2026-02-26T13:07:31Z) - Measuring Uncertainty in Transformer Circuits with Effective Information Consistency [0.0]
We develop a sheaf/cohomology and causal emergence perspective to Transformer Circuits.<n>EICS combines (i) a normalized sheaf inconsistency computed from local Jacobians and activations, with (ii) a Gaussian EI proxy for circuit-level causal emergence.<n>We provide practical guidance on score interpretation, computational overhead (with fast and exact modes), and a toy sanity-check analysis.
arXiv Detail & Related papers (2025-09-08T18:54:56Z) - Adaptive Root Cause Localization for Microservice Systems with Multi-Agent Recursion-of-Thought [11.307072056343662]
We introduce RCLAgent, an adaptive root cause localization method for microservice systems.<n>We show that RCLAgent achieves superior performance by localizing the root cause using only a single request-outperforming state-of-the-art methods.
arXiv Detail & Related papers (2025-08-28T02:34:19Z) - Provable In-Context Learning of Nonlinear Regression with Transformers [66.99048542127768]
In-context learning (ICL) is the ability to perform unseen tasks using task specific prompts without updating parameters.<n>Recent research has actively explored the training dynamics behind ICL, with much of the focus on relatively simple tasks.<n>This paper investigates more complex nonlinear regression tasks, aiming to uncover how transformers acquire in-context learning capabilities.
arXiv Detail & Related papers (2025-07-28T00:09:28Z) - Transformers Don't In-Context Learn Least Squares Regression [5.648229654902264]
In-context learning (ICL) has emerged as a powerful capability of large pretrained transformers.<n>We study how transformers implement learning at inference time.<n>We highlight the role of the pretraining corpus in shaping ICL behaviour.
arXiv Detail & Related papers (2025-07-13T01:09:26Z) - Revisiting LRP: Positional Attribution as the Missing Ingredient for Transformer Explainability [53.21677928601684]
Layer-wise relevance propagation is one of the most promising approaches to explainability in deep learning.<n>We propose specialized theoretically-grounded LRP rules designed to propagate attributions across various positional encoding methods.<n>Our method significantly outperforms the state-of-the-art in both vision and NLP explainability tasks.
arXiv Detail & Related papers (2025-06-02T18:07:55Z) - Are Transformers in Pre-trained LM A Good ASR Encoder? An Empirical Study [52.91899050612153]
transformers within pre-trained language models (PLMs) when repurposed as encoders for Automatic Speech Recognition (ASR)
Our findings reveal a notable improvement in Character Error Rate (CER) and Word Error Rate (WER) across diverse ASR tasks when transformers from pre-trained LMs are incorporated.
This underscores the potential of leveraging the semantic prowess embedded within pre-trained transformers to advance ASR systems' capabilities.
arXiv Detail & Related papers (2024-09-26T11:31:18Z) - Transformer Circuit Faithfulness Metrics are not Robust [0.04260910081285213]
We measure circuit 'faithfulness' by ablating portions of the model's computation.
We conclude that existing circuit faithfulness scores reflect both the methodological choices of researchers as well as the actual components of the circuit.
The ultimate goal of mechanistic interpretability work is to understand neural networks, so we emphasize the need for more clarity in the precise claims being made about circuits.
arXiv Detail & Related papers (2024-07-11T17:59:00Z) - Sheaf Discovery with Joint Computation Graph Pruning and Flexible Granularity [18.71252449465396]
We introduce DiscoGP, a framework for extracting self-contained modular units from neural language models (LMs)<n>Our framework identifies sheaves through a gradient-based pruning algorithm that operates on both of these in such a way that reduces the original LM to a sparse skeleton that preserves certain core capabilities.
arXiv Detail & Related papers (2024-07-04T09:42:25Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - ASR: Attention-alike Structural Re-parameterization [53.019657810468026]
We propose a simple-yet-effective attention-alike structural re- parameterization (ASR) that allows us to achieve SRP for a given network while enjoying the effectiveness of the attention mechanism.
In this paper, we conduct extensive experiments from a statistical perspective and discover an interesting phenomenon Stripe Observation, which reveals that channel attention values quickly approach some constant vectors during training.
arXiv Detail & Related papers (2023-04-13T08:52:34Z) - Transformer-based Planning for Symbolic Regression [18.90700817248397]
We propose TPSR, a Transformer-based Planning strategy for Symbolic Regression.
Unlike conventional decoding strategies, TPSR enables the integration of non-differentiable feedback, such as fitting accuracy and complexity.
Our approach outperforms state-of-the-art methods, enhancing the model's fitting-complexity trade-off, Symbolic abilities, and robustness to noise.
arXiv Detail & Related papers (2023-03-13T03:29:58Z) - XAI for Transformers: Better Explanations through Conservative
Propagation [60.67748036747221]
We show that the gradient in a Transformer reflects the function only locally, and thus fails to reliably identify the contribution of input features to the prediction.
Our proposal can be seen as a proper extension of the well-established LRP method to Transformers.
arXiv Detail & Related papers (2022-02-15T10:47:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.