Multiple Streams of Relation Extraction: Enriching and Recalling in Transformers
- URL: http://arxiv.org/abs/2506.20746v1
- Date: Wed, 25 Jun 2025 18:13:34 GMT
- Title: Multiple Streams of Relation Extraction: Enriching and Recalling in Transformers
- Authors: Todd Nief, David Reber, Sean Richardson, Ari Holtzman,
- Abstract summary: We show that fine-tuned language models both extract relation information learned during finetuning while processing entities and (2) recall" this information in later layers while generating predictions.<n>We examine the necessity and sufficiency of these information pathways, examining what layers they occur at, how much redundancy they exhibit, and which model components are involved.
- Score: 9.901842773988946
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When an LLM learns a relation during finetuning (e.g., new movie releases, corporate mergers, etc.), where does this information go? Is it extracted when the model processes an entity, recalled just-in-time before a prediction, or are there multiple separate heuristics? Existing localization approaches (e.g. activation patching) are ill-suited for this analysis because they tend to replace parts of the residual stream, potentially deleting information. To fill this gap, we propose dynamic weight-grafting between fine-tuned and pre-trained language models to show that fine-tuned language models both (1) extract relation information learned during finetuning while processing entities and (2) ``recall" this information in later layers while generating predictions. In some cases, models need both of these pathways to correctly generate finetuned information while, in other cases, a single ``enrichment" or ``recall" pathway alone is sufficient. We examine the necessity and sufficiency of these information pathways, examining what layers they occur at, how much redundancy they exhibit, and which model components are involved -- finding that the ``recall" pathway occurs via both task-specific attention mechanisms and a relation extraction step in the output of the attention and the feedforward networks at the final layers before next token prediction.
Related papers
- Diagnosing Representation Dynamics in NER Model Extension [0.0]
We find that fine-tuning a BERT model on standard semantic entities and new pattern-based PII results in minimal degradation for original classes.<n>This work provides a mechanistic diagnosis of NER model adaptation, highlighting feature independence, representation overlap, and 'O' tag plasticity.
arXiv Detail & Related papers (2025-10-20T14:53:42Z) - Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs [54.167494079321465]
Current unlearning methods for LLMs optimize on the private information they seek to remove by incorporating it into their fine-tuning data.<n>We propose a novel unlearning method-Partial Model Collapse (PMC), which does not require unlearning targets in the unlearning objective.
arXiv Detail & Related papers (2025-07-06T03:08:49Z) - Maximally-Informative Retrieval for State Space Model Generation [59.954191072042526]
We introduce Retrieval In-Context Optimization (RICO) to minimize model uncertainty for a particular query at test-time.<n>Unlike traditional retrieval-augmented generation (RAG), which relies on externals for document retrieval, our approach leverages direct feedback from the model.<n>We show that standard top-$k$ retrieval with model gradients can approximate our optimization procedure, and provide connections to the leave-one-out loss.
arXiv Detail & Related papers (2025-06-13T18:08:54Z) - Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction [55.914891182214475]
We introduce neural network reprogrammability as a unifying framework for model adaptation.<n>We present a taxonomy that categorizes such information manipulation approaches across four key dimensions.<n>We also analyze remaining technical challenges and ethical considerations.
arXiv Detail & Related papers (2025-06-05T05:42:27Z) - Talking Heads: Understanding Inter-layer Communication in Transformer Language Models [32.2976613483151]
We analyze a mechanism used in two LMs to selectively inhibit items in a context in one task.<n>We find that models write into low-rank subspaces of the residual stream to represent features which are then read out by later layers.
arXiv Detail & Related papers (2024-06-13T18:12:01Z) - Heat Death of Generative Models in Closed-Loop Learning [63.83608300361159]
We study the learning dynamics of generative models that are fed back their own produced content in addition to their original training dataset.
We show that, unless a sufficient amount of external data is introduced at each iteration, any non-trivial temperature leads the model to degenerate.
arXiv Detail & Related papers (2024-04-02T21:51:39Z) - Where is the answer? Investigating Positional Bias in Language Model Knowledge Extraction [36.40833517478628]
Large language models require updates to remain up-to-date or adapt to new domains.<n>One key is memorizing the latest information in a way that the memorized information is extractable with a query prompt.<n>Despite minimizing document perplexity during fine-tuning, LLMs struggle to extract information through a prompt sentence.
arXiv Detail & Related papers (2024-02-16T06:29:16Z) - Target inductive methods for zero-shot regression [0.0]
This research arises from the need to predict the amount of air pollutants in meteorological stations.
Air pollution depends on the location of the stations (weather conditions and activities in the surroundings)
This paper proposes two zero-shot methods for regression.
arXiv Detail & Related papers (2024-02-02T09:19:45Z) - In-Context Convergence of Transformers [63.04956160537308]
We study the learning dynamics of a one-layer transformer with softmax attention trained via gradient descent.
For data with imbalanced features, we show that the learning dynamics take a stage-wise convergence process.
arXiv Detail & Related papers (2023-10-08T17:55:33Z) - Can LMs Learn New Entities from Descriptions? Challenges in Propagating
Injected Knowledge [72.63368052592004]
We study LMs' abilities to make inferences based on injected facts (or propagate those facts)
We find that existing methods for updating knowledge show little propagation of injected knowledge.
Yet, prepending entity definitions in an LM's context improves performance across all settings.
arXiv Detail & Related papers (2023-05-02T17:59:46Z) - Dissecting Recall of Factual Associations in Auto-Regressive Language
Models [41.71388509750695]
Transformer-based language models (LMs) are known to capture factual knowledge in their parameters.
We study how the model aggregates information about the subject and relation to predict the correct attribute.
Our findings introduce a comprehensive view of how factual associations are stored and extracted internally in LMs.
arXiv Detail & Related papers (2023-04-28T11:26:17Z) - Tracing and Manipulating Intermediate Values in Neural Math Problem
Solvers [29.957075459315384]
How language models process complex input that requires multiple steps of inference is not well understood.
Previous research has shown that information about intermediate values of these inputs can be extracted from the activations of the models.
We introduce a method for analyzing how a Transformer model processes these inputs by focusing on simple arithmetic problems and their intermediate values.
arXiv Detail & Related papers (2023-01-17T08:46:50Z) - What Are You Token About? Dense Retrieval as Distributions Over the
Vocabulary [68.77983831618685]
We propose to interpret the vector representations produced by dual encoders by projecting them into the model's vocabulary space.
We show that the resulting projections contain rich semantic information, and draw connection between them and sparse retrieval.
arXiv Detail & Related papers (2022-12-20T16:03:25Z) - Temporal Relevance Analysis for Video Action Models [70.39411261685963]
We first propose a new approach to quantify the temporal relationships between frames captured by CNN-based action models.
We then conduct comprehensive experiments and in-depth analysis to provide a better understanding of how temporal modeling is affected.
arXiv Detail & Related papers (2022-04-25T19:06:48Z) - Temporal Relation Extraction with a Graph-Based Deep Biaffine Attention
Model [0.0]
We propose a novel temporal information extraction model based on deep biaffine attention.
We experimentally demonstrate that our model achieves state-of-the-art performance in temporal relation extraction.
arXiv Detail & Related papers (2022-01-16T19:40:08Z) - MapRE: An Effective Semantic Mapping Approach for Low-resource Relation
Extraction [11.821464352959454]
We propose a framework considering both label-agnostic and label-aware semantic mapping information for low resource relation extraction.
We show that incorporating the above two types of mapping information in both pretraining and fine-tuning can significantly improve the model performance.
arXiv Detail & Related papers (2021-09-09T09:02:23Z) - Effective Distant Supervision for Temporal Relation Extraction [49.20329405920023]
A principal barrier to training temporal relation extraction models in new domains is the lack of varied, high quality examples.
We present a method of automatically collecting distantly-supervised examples of temporal relations.
arXiv Detail & Related papers (2020-10-24T03:17:31Z) - Understanding Neural Abstractive Summarization Models via Uncertainty [54.37665950633147]
seq2seq abstractive summarization models generate text in a free-form manner.
We study the entropy, or uncertainty, of the model's token-level predictions.
We show that uncertainty is a useful perspective for analyzing summarization and text generation models more broadly.
arXiv Detail & Related papers (2020-10-15T16:57:27Z) - Autoregressive Entity Retrieval [55.38027440347138]
Entities are at the center of how we represent and aggregate knowledge.
The ability to retrieve such entities given a query is fundamental for knowledge-intensive tasks such as entity linking and open-domain question answering.
We propose GENRE, the first system that retrieves entities by generating their unique names, left to right, token-by-token in an autoregressive fashion.
arXiv Detail & Related papers (2020-10-02T10:13:31Z) - A Simple Approach to Case-Based Reasoning in Knowledge Bases [56.661396189466664]
We present a surprisingly simple yet accurate approach to reasoning in knowledge graphs (KGs) that requires emphno training, and is reminiscent of case-based reasoning in classical artificial intelligence (AI)
Consider the task of finding a target entity given a source entity and a binary relation.
Our non-parametric approach derives crisp logical rules for each query by finding multiple textitgraph path patterns that connect similar source entities through the given relation.
arXiv Detail & Related papers (2020-06-25T06:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.