Backward Lens: Projecting Language Model Gradients into the Vocabulary
Space
- URL: http://arxiv.org/abs/2402.12865v1
- Date: Tue, 20 Feb 2024 09:57:08 GMT
- Title: Backward Lens: Projecting Language Model Gradients into the Vocabulary
Space
- Authors: Shahar Katz, Yonatan Belinkov, Mor Geva, Lior Wolf
- Abstract summary: We show that a gradient matrix can be cast as a low-rank linear combination of its forward and backward passes' inputs.
We then develop methods to project these gradients into vocabulary items and explore the mechanics of how new information is stored in the LMs' neurons.
- Score: 94.85922991881242
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding how Transformer-based Language Models (LMs) learn and recall
information is a key goal of the deep learning community. Recent
interpretability methods project weights and hidden states obtained from the
forward pass to the models' vocabularies, helping to uncover how information
flows within LMs. In this work, we extend this methodology to LMs' backward
pass and gradients. We first prove that a gradient matrix can be cast as a
low-rank linear combination of its forward and backward passes' inputs. We then
develop methods to project these gradients into vocabulary items and explore
the mechanics of how new information is stored in the LMs' neurons.
Related papers
- Jogging the Memory of Unlearned LLMs Through Targeted Relearning Attacks [37.061187080745654]
We show that existing approaches for unlearning in LLMs are surprisingly susceptible to a simple set of targeted relearning attacks.
With access to only a small and potentially loosely related set of data, we find that we can "jog" the memory of unlearned models to reverse the effects of unlearning.
arXiv Detail & Related papers (2024-06-19T09:03:21Z) - Understanding Information Storage and Transfer in Multi-modal Large Language Models [51.20840103605018]
We study how Multi-modal Large Language Models process information in a factual visual question answering task.
Key findings show that these MLLMs rely on self-attention blocks in much earlier layers for information storage.
We introduce MultEdit, a model-editing algorithm that can correct errors and insert new long-tailed information into MLLMs.
arXiv Detail & Related papers (2024-06-06T16:35:36Z) - Bridging Vision and Language Spaces with Assignment Prediction [47.04855334955006]
VLAP is a novel approach that bridges pretrained vision models and large language models (LLMs)
We harness well-established word embeddings to bridge two modality embedding spaces.
VLAP achieves substantial improvements over the previous linear transformation-based approaches.
arXiv Detail & Related papers (2024-04-15T10:04:15Z) - Large Language Model with Graph Convolution for Recommendation [21.145230388035277]
Text information can sometimes be of low quality, hindering its effectiveness for real-world applications.
With knowledge and reasoning capabilities capsuled in Large Language Models, utilizing LLMs emerges as a promising way for description improvement.
We propose a Graph-aware Convolutional LLM method to elicit LLMs to capture high-order relations in the user-item graph.
arXiv Detail & Related papers (2024-02-14T00:04:33Z) - Do LLMs Dream of Ontologies? [15.049502693786698]
Large language models (LLMs) have recently revolutionized automated text understanding and generation.
This paper investigates whether and to what extent general-purpose pre-trained LLMs have information from known.
arXiv Detail & Related papers (2024-01-26T15:10:23Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Can LMs Learn New Entities from Descriptions? Challenges in Propagating
Injected Knowledge [72.63368052592004]
We study LMs' abilities to make inferences based on injected facts (or propagate those facts)
We find that existing methods for updating knowledge show little propagation of injected knowledge.
Yet, prepending entity definitions in an LM's context improves performance across all settings.
arXiv Detail & Related papers (2023-05-02T17:59:46Z) - Augmented Language Models: a Survey [55.965967655575454]
This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools.
We refer to them as Augmented Language Models (ALMs)
The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks.
arXiv Detail & Related papers (2023-02-15T18:25:52Z) - Detecting Unintended Memorization in Language-Model-Fused ASR [10.079200692649462]
We propose a framework for detecting memorization of random textual sequences (which we call canaries) in the LM training data.
On a production-grade Conformer RNN-T E2E model fused with a Transformer LM, we show that detecting memorization of canaries from the LM training data of 300M examples is possible.
Motivated to protect privacy, we also show that such memorization gets significantly reduced by per-example gradient-clipped LM training.
arXiv Detail & Related papers (2022-04-20T16:35:13Z) - Language Model Prior for Low-Resource Neural Machine Translation [85.55729693003829]
We propose a novel approach to incorporate a LM as prior in a neural translation model (TM)
We add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior.
Results on two low-resource machine translation datasets show clear improvements even with limited monolingual data.
arXiv Detail & Related papers (2020-04-30T16:29:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.