Interpretable Next-token Prediction via the Generalized Induction Head
- URL: http://arxiv.org/abs/2411.00066v2
- Date: Fri, 24 Oct 2025 05:50:14 GMT
- Title: Interpretable Next-token Prediction via the Generalized Induction Head
- Authors: Eunji Kim, Sriya Mantena, Weiwei Yang, Chandan Singh, Sungroh Yoon, Jianfeng Gao,
- Abstract summary: Generalized Induction-Head Model (GIM) is an interpretable model for next-token prediction.<n>In language modeling, GIM improves next-token prediction by up to 25%p over interpretable baselines.<n>In an fMRI setting, GIM improves neural response prediction by 20%.
- Score: 59.500195503897764
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While large transformer models excel in predictive performance, their lack of interpretability restricts their usefulness in high-stakes domains. To remedy this, we propose the Generalized Induction-Head Model (GIM), an interpretable model for next-token prediction inspired by the observation of "induction heads" in LLMs. GIM is a retrieval-based module that identifies similar sequences in the input context by combining exact n-gram matching and fuzzy matching based on a neural similarity metric. We evaluate GIM in two settings: language modeling and fMRI response prediction. In language modeling, GIM improves next-token prediction by up to 25%p over interpretable baselines, significantly narrowing the gap with black-box LLMs. In an fMRI setting, GIM improves neural response prediction by 20% and offers insights into the language selectivity of the brain. GIM represents a significant step toward uniting interpretability and performance across domains. The code is available at https://github.com/ejkim47/generalized-induction-head.
Related papers
- Context-level Language Modeling by Learning Predictive Context Embeddings [79.00607069677393]
We introduce textbfContextLM, a framework that augments standard pretraining with an inherent textbfnext-context prediction objective.<n>This mechanism trains the model to learn predictive representations of multi-token contexts, leveraging error signals derived from future token chunks.<n>Experiments on the GPT2 and Pythia model families, scaled up to $1.5$B parameters, show that ContextLM delivers consistent improvements in both perplexity and downstream task performance.
arXiv Detail & Related papers (2025-10-23T07:09:45Z) - How Do LLMs Use Their Depth? [17.148445769990907]
We show that large language models do not use their depth uniformly, yet we still lack a fine-grained understanding of their layer-wise prediction dynamics.<n>We propose a "Guess-then-Refine" framework that explains how LLMs internally structure their computations to make predictions.
arXiv Detail & Related papers (2025-10-21T17:59:05Z) - Interpretable Hybrid Machine Learning Models Using FOLD-R++ and Answer Set Programming [5.911540700785975]
In parallel, symbolic methods like Answer Set Programming (ASP) offer the possibility of interpretable logical rules.<n>This paper proposes a hybrid approach that integrates ASP-derived rules from the FOLD-R++ algorithm with black-box ML classifiers.<n>Experiments on five medical datasets reveal statistically significant performance gains in accuracy and F1 score.
arXiv Detail & Related papers (2025-06-24T12:37:17Z) - Probing Neural Topology of Large Language Models [15.34202977968525]
We introduce graph probing, a method for uncovering the functional connectivity topology of LLM neurons.<n>We find a universal predictability of next-token prediction performance using only neural topology.<n>This predictability is robust even when retaining just 1% of neuron connections or probing models after only 8 pretraining steps.
arXiv Detail & Related papers (2025-06-01T14:57:03Z) - Exploring Training and Inference Scaling Laws in Generative Retrieval [50.82554729023865]
We investigate how model size, training data scale, and inference-time compute jointly influence generative retrieval performance.
Our experiments show that n-gram-based methods demonstrate strong alignment with both training and inference scaling laws.
We find that LLaMA models consistently outperform T5 models, suggesting a particular advantage for larger decoder-only models in generative retrieval.
arXiv Detail & Related papers (2025-03-24T17:59:03Z) - Latent Thought Models with Variational Bayes Inference-Time Computation [52.63299874322121]
Latent Thought Models (LTMs) incorporate explicit latent thought vectors that follow an explicit prior model in latent space.<n>LTMs demonstrate superior sample and parameter efficiency compared to autoregressive models and discrete diffusion models.
arXiv Detail & Related papers (2025-02-03T17:50:34Z) - SAFR: Neuron Redistribution for Interpretability [7.756342860929851]
Superposition refers to encoding representations of multiple features within a single neuron.<n>Despite promising performance, the model's interpretability has been diminished.<n>This paper presents a novel approach to enhance model interpretability by regularizing feature superposition.
arXiv Detail & Related papers (2025-01-23T06:20:33Z) - Improving Neuron-level Interpretability with White-box Language Models [11.898535906016907]
We introduce a white-box transformer-like architecture named Coding RAte TransformEr (CRATE)
Our comprehensive experiments showcase significant improvements (up to 103% relative improvement) in neuron-level interpretability.
CRATE's increased interpretability comes from its enhanced ability to consistently and distinctively activate on relevant tokens.
arXiv Detail & Related papers (2024-10-21T19:12:33Z) - Revisiting N-Gram Models: Their Impact in Modern Neural Networks for Handwritten Text Recognition [4.059708117119894]
This study addresses whether explicit language models, specifically n-gram models, still contribute to the performance of state-of-the-art deep learning architectures in the field of handwriting recognition.
We evaluate two prominent neural network architectures, PyLaia and DAN, with and without the integration of explicit n-gram language models.
The results show that incorporating character or subword n-gram models significantly improves the performance of ATR models on all datasets.
arXiv Detail & Related papers (2024-04-30T07:37:48Z) - In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL)
We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z) - N2G: A Scalable Approach for Quantifying Interpretable Neuron
Representations in Large Language Models [0.0]
N2G is a tool which takes a neuron and its dataset examples, and automatically distills the neuron's behaviour on those examples to an interpretable graph.
We use truncation and saliency methods to only present the important tokens, and augment the dataset examples with more diverse samples to better capture the extent of neuron behaviour.
These graphs can be visualised to aid manual interpretation by researchers, but can also output token activations on text to compare to the neuron's ground truth activations for automatic validation.
arXiv Detail & Related papers (2023-04-22T19:06:13Z) - Neural Additive Models for Location Scale and Shape: A Framework for
Interpretable Neural Regression Beyond the Mean [1.0923877073891446]
Deep neural networks (DNNs) have proven to be highly effective in a variety of tasks.
Despite this success, the inner workings of DNNs are often not transparent.
This lack of interpretability has led to increased research on inherently interpretable neural networks.
arXiv Detail & Related papers (2023-01-27T17:06:13Z) - Residual Learning of Neural Text Generation with $n$-gram Language Model [41.26228768053928]
We learn a neural LM that fits the residual between an $n$-gram LM and the real-data distribution.
Our approach attains additional performance gains over popular standalone neural models consistently.
arXiv Detail & Related papers (2022-10-26T02:42:53Z) - Better Language Model with Hypernym Class Prediction [101.8517004687825]
Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs.
In this study, we revisit this approach in the context of neural LMs.
arXiv Detail & Related papers (2022-03-21T01:16:44Z) - Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore.
We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - An Investigation of Potential Function Designs for Neural CRF [75.79555356970344]
In this paper, we investigate a series of increasingly expressive potential functions for neural CRF models.
Our experiments show that the decomposed quadrilinear potential function based on the vector representations of two neighboring labels and two neighboring words consistently achieves the best performance.
arXiv Detail & Related papers (2020-11-11T07:32:18Z) - Explaining and Improving Model Behavior with k Nearest Neighbor
Representations [107.24850861390196]
We propose using k nearest neighbor representations to identify training examples responsible for a model's predictions.
We show that kNN representations are effective at uncovering learned spurious associations.
Our results indicate that the kNN approach makes the finetuned model more robust to adversarial inputs.
arXiv Detail & Related papers (2020-10-18T16:55:25Z) - Interpreting Graph Neural Networks for NLP With Differentiable Edge
Masking [63.49779304362376]
Graph neural networks (GNNs) have become a popular approach to integrating structural inductive biases into NLP models.
We introduce a post-hoc method for interpreting the predictions of GNNs which identifies unnecessary edges.
We show that we can drop a large proportion of edges without deteriorating the performance of the model.
arXiv Detail & Related papers (2020-10-01T17:51:19Z) - Surrogate Locally-Interpretable Models with Supervised Machine Learning
Algorithms [8.949704905866888]
Supervised Machine Learning algorithms have become popular in recent years due to their superior predictive performance over traditional statistical methods.
The main focus is on interpretability, the resulting surrogate model also has reasonably good predictive performance.
arXiv Detail & Related papers (2020-07-28T23:46:16Z) - Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.