Related papers: Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers

Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers

URL: http://arxiv.org/abs/2411.08745v2
Date: Mon, 18 Nov 2024 14:41:38 GMT
Title: Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers
Authors: Clément Dumas, Chris Wendler, Veniamin Veselovsky, Giovanni Monea, Robert West,
Abstract summary: We analyze latent representations (latents) during a word translation task in transformer-based language models. We find that the output language is encoded in the latent at an earlier layer than the concept to be translated. Our results provide evidence for the existence of language-agnostic concept representations within the investigated models.
Score: 12.94303673025761
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A central question in multilingual language modeling is whether large language models (LLMs) develop a universal concept representation, disentangled from specific languages. In this paper, we address this question by analyzing latent representations (latents) during a word translation task in transformer-based LLMs. We strategically extract latents from a source translation prompt and insert them into the forward pass on a target translation prompt. By doing so, we find that the output language is encoded in the latent at an earlier layer than the concept to be translated. Building on this insight, we conduct two key experiments. First, we demonstrate that we can change the concept without changing the language and vice versa through activation patching alone. Second, we show that patching with the mean over latents across different languages does not impair and instead improves the models' performance in translating the concept. Our results provide evidence for the existence of language-agnostic concept representations within the investigated models.

Related papers

Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages [15.203789021094982]
In large language models (LLMs), how are multiple languages learned and encoded? We train sparse autoencoders on Llama-3-8B and Aya-23-8B, and demonstrate that abstract grammatical concepts are often encoded in feature directions shared across many languages.
arXiv Detail & Related papers (2025-01-10T21:18:21Z)
Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages [55.157295899188476]
neural machine translation systems learn to map sentences of different languages into a common representation space. In this work, we test this hypothesis by zero-shot translating from unseen languages. We demonstrate that this setup enables zero-shot translation from entirely unseen languages.
arXiv Detail & Related papers (2024-08-05T07:58:58Z)
Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping [60.458273797431836]
Decoding by contrasting layers (DoLa) is designed to improve the generation quality of large language models. We find that this approach does not work well on non-English tasks. Inspired by previous interpretability work on language transition during the model's forward pass, we propose an improved contrastive decoding algorithm.
arXiv Detail & Related papers (2024-07-15T15:14:01Z)
Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation [16.368747052909214]
Understanding representation transfer in multilingual neural machine translation (MNMT) can reveal the reason for the zero-shot translation deficiency. We show that the encoder transfers the source language to the representational subspace of the target language instead of the language-agnostic state. Based on our findings, we propose two methods: 1) low-rank language-specific embedding at the encoder, and 2) language-specific contrastive learning of the representation at the decoder.
arXiv Detail & Related papers (2024-06-12T11:16:30Z)
Do Llamas Work in English? On the Latent Language of Multilingual Transformers [13.885884589999492]
We ask whether multilingual language models trained on unbalanced, English-dominated corpora use English as an internal pivot language. Our study uses carefully constructed non-English prompts with a unique correct single-token continuation. We cast these results into a conceptual model where the three phases operate in "input space", "concept space", and "output space"
arXiv Detail & Related papers (2024-02-16T11:21:28Z)
Unveiling Multilinguality in Transformer Models: Exploring Language Specificity in Feed-Forward Networks [12.7259425362286]
We investigate how multilingual models might leverage key-value memories. For autoregressive models trained on two or more languages, do all neurons (across layers) respond equally to all languages? Our findings reveal that the layers closest to the network's input or output tend to exhibit more language-specific behaviour compared to the layers in the middle.
arXiv Detail & Related papers (2023-10-24T06:45:00Z)
Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers [54.4919139401528]
We show that it is possible to reduce interference by identifying and pruning language-specific parameters. We show that removing identified attention heads from a fixed model improves performance for a target language on both sentence classification and structural prediction.
arXiv Detail & Related papers (2022-10-11T18:11:37Z)
Cross-lingual Transfer of Monolingual Models [2.332247755275824]
We introduce a cross-lingual transfer method for monolingual models based on domain adaptation. We study the effects of such transfer from four different languages to English.
arXiv Detail & Related papers (2021-09-15T15:00:53Z)
First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT [2.2931318723689276]
Cross-lingual transfer emerges from fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning. We show that multilingual BERT can be viewed as the stacking of two sub-networks: a multilingual encoder followed by a task-specific language-agnostic predictor. While the encoder is crucial for cross-lingual transfer and remains mostly unchanged during fine-tuning, the task predictor has little importance on the transfer and can be red during fine-tuning.
arXiv Detail & Related papers (2021-01-26T22:12:38Z)
Improving Zero-Shot Translation by Disentangling Positional Information [24.02434897109097]
We show that a main factor causing the language-specific representations is the positional correspondence to input tokens. We gain up to 18.5 BLEU points on zero-shot translation while retaining quality on supervised directions.
arXiv Detail & Related papers (2020-12-30T12:20:41Z)
VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages. It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language. The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z)
Self-Attention with Cross-Lingual Position Representation [112.05807284056337]
Position encoding (PE) is used to preserve the word order information for natural language processing tasks, generating fixed position indices for input sequences. Due to word order divergences in different languages, modeling the cross-lingual positional relationships might help SANs tackle this problem. We augment SANs with emphcross-lingual position representations to model the bilingually aware latent structure for the input sentence.
arXiv Detail & Related papers (2020-04-28T05:23:43Z)
Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models. In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them. We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.