Related papers: Backward Lens: Projecting Language Model Gradients into the Vocabulary Space

Backward Lens: Projecting Language Model Gradients into the Vocabulary Space

URL: http://arxiv.org/abs/2402.12865v1
Date: Tue, 20 Feb 2024 09:57:08 GMT
Title: Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
Authors: Shahar Katz, Yonatan Belinkov, Mor Geva, Lior Wolf
Abstract summary: We show that a gradient matrix can be cast as a low-rank linear combination of its forward and backward passes' inputs. We then develop methods to project these gradients into vocabulary items and explore the mechanics of how new information is stored in the LMs' neurons.
Score: 94.85922991881242
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding how Transformer-based Language Models (LMs) learn and recall information is a key goal of the deep learning community. Recent interpretability methods project weights and hidden states obtained from the forward pass to the models' vocabularies, helping to uncover how information flows within LMs. In this work, we extend this methodology to LMs' backward pass and gradients. We first prove that a gradient matrix can be cast as a low-rank linear combination of its forward and backward passes' inputs. We then develop methods to project these gradients into vocabulary items and explore the mechanics of how new information is stored in the LMs' neurons.

Related papers

How new data permeates LLM knowledge and how to dilute it [19.96863816288517]
Large language models learn and continually learn through the accumulation of gradient-based updates. We demonstrate that when learning new information, LLMs exhibit a "priming" effect: learning a new fact can cause the model to inappropriately apply that knowledge in unrelated contexts. We show that the degree of priming after learning new information can be predicted by measuring the token probability of key words before learning.
arXiv Detail & Related papers (2025-04-13T11:25:04Z)
Identifying Multi-modal Knowledge Neurons in Pretrained Transformers via Two-stage Filtering [0.0]
We propose a method to identify neurons associated with specific knowledge using MiniGPT-4, a Transformer-based MLLM. Experiments on the image caption generation task showed that our method is able to locate knowledge with higher accuracy than existing methods.
arXiv Detail & Related papers (2025-03-29T02:16:15Z)
MMUnlearner: Reformulating Multimodal Machine Unlearning in the Era of Multimodal Large Language Models [19.36626553745877]
We reformulate the task of multimodal Machine Unlearning (MU) in the era of Multimodal Large Language Models (MLLMs) We develop a novel geometry-constrained descent gradient method MMUnlearner. It updates the weights of MLLMs with a weight saliency map jointly restricted by the remaining concepts and textual knowledge during unlearning.
arXiv Detail & Related papers (2025-02-16T09:23:50Z)
Jogging the Memory of Unlearned LLMs Through Targeted Relearning Attacks [37.061187080745654]
We show that existing approaches for unlearning in LLMs are surprisingly susceptible to a simple set of targeted relearning attacks. With access to only a small and potentially loosely related set of data, we find that we can "jog" the memory of unlearned models to reverse the effects of unlearning.
arXiv Detail & Related papers (2024-06-19T09:03:21Z)
Understanding Information Storage and Transfer in Multi-modal Large Language Models [51.20840103605018]
We study how Multi-modal Large Language Models process information in a factual visual question answering task. Key findings show that these MLLMs rely on self-attention blocks in much earlier layers for information storage. We introduce MultEdit, a model-editing algorithm that can correct errors and insert new long-tailed information into MLLMs.
arXiv Detail & Related papers (2024-06-06T16:35:36Z)
Bridging Vision and Language Spaces with Assignment Prediction [47.04855334955006]
VLAP is a novel approach that bridges pretrained vision models and large language models (LLMs) We harness well-established word embeddings to bridge two modality embedding spaces. VLAP achieves substantial improvements over the previous linear transformation-based approaches.
arXiv Detail & Related papers (2024-04-15T10:04:15Z)
Large Language Model with Graph Convolution for Recommendation [21.145230388035277]
Text information can sometimes be of low quality, hindering its effectiveness for real-world applications. With knowledge and reasoning capabilities capsuled in Large Language Models, utilizing LLMs emerges as a promising way for description improvement. We propose a Graph-aware Convolutional LLM method to elicit LLMs to capture high-order relations in the user-item graph.
arXiv Detail & Related papers (2024-02-14T00:04:33Z)
Do LLMs Dream of Ontologies? [15.049502693786698]
Large language models (LLMs) have recently revolutionized automated text understanding and generation. This paper investigates whether and to what extent general-purpose pre-trained LLMs have information from known.
arXiv Detail & Related papers (2024-01-26T15:10:23Z)
Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning. Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset. We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU) We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z)
Can LMs Learn New Entities from Descriptions? Challenges in Propagating Injected Knowledge [72.63368052592004]
We study LMs' abilities to make inferences based on injected facts (or propagate those facts) We find that existing methods for updating knowledge show little propagation of injected knowledge. Yet, prepending entity definitions in an LM's context improves performance across all settings.
arXiv Detail & Related papers (2023-05-02T17:59:46Z)
Augmented Language Models: a Survey [55.965967655575454]
This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. We refer to them as Augmented Language Models (ALMs) The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks.
arXiv Detail & Related papers (2023-02-15T18:25:52Z)
Detecting Unintended Memorization in Language-Model-Fused ASR [10.079200692649462]
We propose a framework for detecting memorization of random textual sequences (which we call canaries) in the LM training data. On a production-grade Conformer RNN-T E2E model fused with a Transformer LM, we show that detecting memorization of canaries from the LM training data of 300M examples is possible. Motivated to protect privacy, we also show that such memorization gets significantly reduced by per-example gradient-clipped LM training.
arXiv Detail & Related papers (2022-04-20T16:35:13Z)
Language Model Prior for Low-Resource Neural Machine Translation [85.55729693003829]
We propose a novel approach to incorporate a LM as prior in a neural translation model (TM) We add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior. Results on two low-resource machine translation datasets show clear improvements even with limited monolingual data.
arXiv Detail & Related papers (2020-04-30T16:29:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.