The Dual-Route Model of Induction
- URL: http://arxiv.org/abs/2504.03022v1
- Date: Thu, 03 Apr 2025 20:40:31 GMT
- Title: The Dual-Route Model of Induction
- Authors: Sheridan Feucht, Eric Todd, Byron Wallace, David Bau,
- Abstract summary: We introduce concept-level induction heads, which copy entire lexical units instead of individual tokens.<n>We show that concept induction heads are responsible for semantic tasks like word-level translation, whereas token induction heads are vital for tasks that can only be done verbatim.
- Score: 19.752542337008773
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prior work on in-context copying has shown the existence of induction heads, which attend to and promote individual tokens during copying. In this work we introduce a new type of induction head: concept-level induction heads, which copy entire lexical units instead of individual tokens. Concept induction heads learn to attend to the ends of multi-token words throughout training, working in parallel with token-level induction heads to copy meaningful text. We show that these heads are responsible for semantic tasks like word-level translation, whereas token induction heads are vital for tasks that can only be done verbatim, like copying nonsense tokens. These two "routes" operate independently: in fact, we show that ablation of token induction heads causes models to paraphrase where they would otherwise copy verbatim. In light of these findings, we argue that although token induction heads are vital for specific tasks, concept induction heads may be more broadly relevant for in-context learning.
Related papers
- In-Context Learning Without Copying [31.718993147344353]
We study whether transformers can still acquire in-context learning capabilities when inductive copying is suppressed.<n>We propose Hapax, a setting where we omit the loss contribution of any token that can be correctly predicted by induction heads.<n>Mechanistic analysis shows that models trained with Hapax develop fewer and weaker induction heads but still preserve ICL capabilities.
arXiv Detail & Related papers (2025-11-07T22:11:11Z) - On the Emergence of Induction Heads for In-Context Learning [121.64612469118464]
We study the emergence of induction heads, a previously identified mechanism in two-layer transformers.<n>We explain the origin of this structure using a minimal ICL task formulation and a modified transformer architecture.
arXiv Detail & Related papers (2025-11-02T18:12:06Z) - A circuit for predicting hierarchical structure in-context in Large Language Models [19.35678318316516]
Large Language Models (LLMs) excel at in-context learning, the ability to use information provided as context to improve prediction of future tokens.<n>In this study, we design a synthetic in-context learning task, where tokens are repeated with hierarchical dependencies.<n>We find adaptive induction heads that support prediction by learning what to attend to in-context.
arXiv Detail & Related papers (2025-09-25T20:20:23Z) - Which Attention Heads Matter for In-Context Learning? [41.048579134842285]
Large language models (LLMs) exhibit impressive in-context learning (ICL) capability.
Two different mechanisms have been proposed to explain ICL: induction heads that find and copy relevant tokens, and function vector (FV) heads whose activations compute a latent encoding of the ICL task.
We study and compare induction heads and FV heads in 12 language models.
arXiv Detail & Related papers (2025-02-19T12:25:02Z) - Do Attention Heads Compete or Cooperate during Counting? [0.12116854758481393]
We present an in-depth mechanistic interpretability analysis of training small transformers on an elementary task, counting.<n>We ask whether the attention heads behave as a pseudo-ensemble, all solving the same subtask, or they perform different subtasks, meaning that they can only solve the original task in conjunction.
arXiv Detail & Related papers (2025-02-10T17:21:39Z) - To Word Senses and Beyond: Inducing Concepts with Contextualized Language Models [0.9176056742068812]
Polysemy and synonymy are two crucial interrelated facets of lexical ambiguity.<n>In this paper, we introduce Concept Induction, the unsupervised task of learning a soft clustering among words.<n>We propose a bi-level approach to Concept Induction that leverages both a local lemma-centric view and a global cross-lexicon view to induce concepts.
arXiv Detail & Related papers (2024-06-28T17:07:06Z) - SEP: Self-Enhanced Prompt Tuning for Visual-Language Model [93.94454894142413]
We introduce a novel approach named Self-Enhanced Prompt Tuning (SEP)
SEP explicitly incorporates discriminative prior knowledge to enhance both textual-level and visual-level embeddings.
Comprehensive evaluations across various benchmarks and tasks confirm SEP's efficacy in prompt tuning.
arXiv Detail & Related papers (2024-05-24T13:35:56Z) - Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning [52.70210390424605]
In this work, we consider endowing a neural network autoencoder with three select inductive biases from the literature.
In practice, however, naively combining existing techniques instantiating these inductive biases fails to yield significant benefits.
We propose adaptations to the three techniques that simplify the learning problem, equip key regularization terms with stabilizing invariances, and quash degenerate incentives.
The resulting model, Tripod, achieves state-of-the-art results on a suite of four image disentanglement benchmarks.
arXiv Detail & Related papers (2024-04-16T04:52:41Z) - An Incomplete Loop: Deductive, Inductive, and Abductive Learning in Large Language Models [99.31449616860291]
Modern language models (LMs) can learn to perform new tasks in different ways.
In instruction following, the target task is described explicitly in natural language; in few-shot prompting, the task is specified implicitly.
In instruction inference, LMs are presented with in-context examples and are then prompted to generate a natural language task description.
arXiv Detail & Related papers (2024-04-03T19:31:56Z) - Fantastic Semantics and Where to Find Them: Investigating Which Layers of Generative LLMs Reflect Lexical Semantics [50.982315553104975]
We investigate the bottom-up evolution of lexical semantics for a popular large language model, namely Llama2.
Our experiments show that the representations in lower layers encode lexical semantics, while the higher layers, with weaker semantic induction, are responsible for prediction.
This is in contrast to models with discriminative objectives, such as mask language modeling, where the higher layers obtain better lexical semantics.
arXiv Detail & Related papers (2024-03-03T13:14:47Z) - Identifying Semantic Induction Heads to Understand In-Context Learning [103.00463655766066]
We investigate whether attention heads encode two types of relationships between tokens present in natural languages.
We find that certain attention heads exhibit a pattern where, when attending to head tokens, they recall tail tokens and increase the output logits of those tail tokens.
arXiv Detail & Related papers (2024-02-20T14:43:39Z) - Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement [92.61557711360652]
Language models (LMs) often fall short on inductive reasoning, despite achieving impressive success on research benchmarks.
We conduct a systematic study of the inductive reasoning capabilities of LMs through iterative hypothesis refinement.
We reveal several discrepancies between the inductive reasoning processes of LMs and humans, shedding light on both the potentials and limitations of using LMs in inductive reasoning tasks.
arXiv Detail & Related papers (2023-10-12T17:51:10Z) - Language Models As Semantic Indexers [78.83425357657026]
We introduce LMIndexer, a self-supervised framework to learn semantic IDs with a generative language model.
We show the high quality of the learned IDs and demonstrate their effectiveness on three tasks including recommendation, product search, and document retrieval.
arXiv Detail & Related papers (2023-10-11T18:56:15Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - In-context Learning and Induction Heads [5.123049926855312]
"Induction heads" are attention heads that implement a simple algorithm to complete token sequences.
We find that induction heads develop at precisely the same point as a sudden sharp increase in in-context learning ability.
arXiv Detail & Related papers (2022-09-24T00:43:19Z) - Reciprocal Feature Learning via Explicit and Implicit Tasks in Scene
Text Recognition [60.36540008537054]
In this work, we excavate the implicit task, character counting within the traditional text recognition, without additional labor annotation cost.
We design a two-branch reciprocal feature learning framework in order to adequately utilize the features from both the tasks.
Experiments on 7 benchmarks show the advantages of the proposed methods in both text recognition and the new-built character counting tasks.
arXiv Detail & Related papers (2021-05-13T12:27:35Z) - What's in your Head? Emergent Behaviour in Multi-Task Transformer Models [26.557793822750302]
We study the behaviour of non-target heads, that is, the output of heads when given input that belongs to a different task than the one they were trained for.
We find that non-target heads exhibit emergent behaviour, which may either explain the target task, or generalize beyond their original task.
arXiv Detail & Related papers (2021-04-13T12:04:30Z) - A Minimalist Dataset for Systematic Generalization of Perception,
Syntax, and Semantics [131.93113552146195]
We present a new dataset, Handwritten arithmetic with INTegers (HINT), to examine machines' capability of learning generalizable concepts.
In HINT, machines are tasked with learning how concepts are perceived from raw signals such as images.
We undertake extensive experiments with various sequence-to-sequence models, including RNNs, Transformers, and GPT-3.
arXiv Detail & Related papers (2021-03-02T01:32:54Z) - Towards Coinductive Models for Natural Language Understanding. Bringing
together Deep Learning and Deep Semantics [0.0]
Coinduction has been successfully used in the design of operating systems and programming languages.
It has been present in text mining, machine translation, and in some attempts to model intensionality and modalities.
This article shows several examples of the joint appearance of induction and coinduction in natural language processing.
arXiv Detail & Related papers (2020-12-09T03:10:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.