On the Similarity of Circuits across Languages: a Case Study on the Subject-verb Agreement Task
- URL: http://arxiv.org/abs/2410.06496v1
- Date: Wed, 9 Oct 2024 02:49:56 GMT
- Title: On the Similarity of Circuits across Languages: a Case Study on the Subject-verb Agreement Task
- Authors: Javier Ferrando, Marta R. Costa-jussà ,
- Abstract summary: We study the circuits implemented by Gemma 2B for solving the subject-verb agreement task across two different languages.
Both circuits are highly consistent, being mainly driven by a particular attention head writing a subject number' signal to the last residual stream.
We demonstrate that this direction has a causal effect on the model predictions, effectively flipping the Spanish predicted verb number by intervening with the direction found in English.
- Score: 11.97725093458812
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several algorithms implemented by language models have recently been successfully reversed-engineered. However, these findings have been concentrated on specific tasks and models, leaving it unclear how universal circuits are across different settings. In this paper, we study the circuits implemented by Gemma 2B for solving the subject-verb agreement task across two different languages, English and Spanish. We discover that both circuits are highly consistent, being mainly driven by a particular attention head writing a `subject number' signal to the last residual stream, which is read by a small set of neurons in the final MLPs. Notably, this subject number signal is represented as a direction in the residual stream space, and is language-independent. We demonstrate that this direction has a causal effect on the model predictions, effectively flipping the Spanish predicted verb number by intervening with the direction found in English. Finally, we present evidence of similar behavior in other models within the Gemma 1 and Gemma 2 families.
Related papers
- Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping [60.458273797431836]
Decoding by contrasting layers (DoLa) is designed to improve the generation quality of large language models.
We find that this approach does not work well on non-English tasks.
Inspired by previous interpretability work on language transition during the model's forward pass, we propose an improved contrastive decoding algorithm.
arXiv Detail & Related papers (2024-07-15T15:14:01Z) - Towards Interpretable Sequence Continuation: Analyzing Shared Circuits in Large Language Models [9.56229382432426]
This research aims to reverse engineer transformer models into human-readable representations that implement algorithmic functions.
By applying circuit interpretability analysis, we identify a key sub-circuit in both GPT-2 Small and Llama-2-7B.
We show that this sub-circuit has effects on various math-related prompts, such as on intervaled circuits, Spanish number word and months continuation, and natural language word problems.
arXiv Detail & Related papers (2023-11-07T16:58:51Z) - Unveiling Multilinguality in Transformer Models: Exploring Language
Specificity in Feed-Forward Networks [12.7259425362286]
We investigate how multilingual models might leverage key-value memories.
For autoregressive models trained on two or more languages, do all neurons (across layers) respond equally to all languages?
Our findings reveal that the layers closest to the network's input or output tend to exhibit more language-specific behaviour compared to the layers in the middle.
arXiv Detail & Related papers (2023-10-24T06:45:00Z) - Linguistically Motivated Sign Language Segmentation [51.06873383204105]
We consider two kinds of segmentation: segmentation into individual signs and segmentation into phrases.
Our method is motivated by linguistic cues observed in sign language corpora.
We replace the predominant IO tagging scheme with BIO tagging to account for continuous signing.
arXiv Detail & Related papers (2023-10-21T10:09:34Z) - Circuit Component Reuse Across Tasks in Transformer Language Models [32.2976613483151]
We present evidence that insights can indeed generalize across tasks.
We show that the process underlying both tasks is functionally very similar, and contains about a 78% overlap in in-circuit attention heads.
Our results provide evidence that it may yet be possible to explain large language models' behavior in terms of a relatively small number of interpretable task-general algorithmic building blocks and computational components.
arXiv Detail & Related papers (2023-10-12T22:12:28Z) - VECO 2.0: Cross-lingual Language Model Pre-training with
Multi-granularity Contrastive Learning [56.47303426167584]
We propose a cross-lingual pre-trained model VECO2.0 based on contrastive learning with multi-granularity alignments.
Specifically, the sequence-to-sequence alignment is induced to maximize the similarity of the parallel pairs and minimize the non-parallel pairs.
token-to-token alignment is integrated to bridge the gap between synonymous tokens excavated via the thesaurus dictionary from the other unpaired tokens in a bilingual instance.
arXiv Detail & Related papers (2023-04-17T12:23:41Z) - A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z) - A Hierarchical Model for Spoken Language Recognition [29.948719321162883]
Spoken language recognition ( SLR) refers to the automatic process used to determine the language present in a speech sample.
We propose a novel hierarchical approach were two PLDA models are trained, one to generate scores for clusters of highly related languages and a second one to generate scores conditional to each cluster.
We show that this hierarchical approach consistently outperforms the non-hierarchical one for detection of highly related languages.
arXiv Detail & Related papers (2022-01-04T22:10:36Z) - Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR)
AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities.
Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z) - The Extraordinary Failure of Complement Coercion Crowdsourcing [50.599433903377374]
Crowdsourcing has eased and scaled up the collection of linguistic annotation in recent years.
We aim to collect annotated data for this phenomenon by reducing it to either of two known tasks: Explicit Completion and Natural Language Inference.
In both cases, crowdsourcing resulted in low agreement scores, even though we followed the same methodologies as in previous work.
arXiv Detail & Related papers (2020-10-12T19:04:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.