Related papers: It's All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning

It's All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning

URL: http://arxiv.org/abs/2106.12066v1
Date: Tue, 22 Jun 2021 21:25:43 GMT
Title: It's All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning
Authors: Alexey Tikhonov, Max Ryabinin
Abstract summary: We design a simple approach to commonsense reasoning which trains a linear classifier with weights of multi-head attention as features. The method performs competitively with recent supervised and unsupervised approaches for commonsense reasoning. Most of the performance is given by the same small subset of attention heads for all studied languages.
Score: 4.200736775540874
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Commonsense reasoning is one of the key problems in natural language processing, but the relative scarcity of labeled data holds back the progress for languages other than English. Pretrained cross-lingual models are a source of powerful language-agnostic representations, yet their inherent reasoning capabilities are still actively studied. In this work, we design a simple approach to commonsense reasoning which trains a linear classifier with weights of multi-head attention as features. To evaluate this approach, we create a multilingual Winograd Schema corpus by processing several datasets from prior work within a standardized pipeline and measure cross-lingual generalization ability in terms of out-of-sample performance. The method performs competitively with recent supervised and unsupervised approaches for commonsense reasoning, even when applied to other languages in a zero-shot manner. Also, we demonstrate that most of the performance is given by the same small subset of attention heads for all studied languages, which provides evidence of universal reasoning capabilities in multilingual encoders.

Related papers

Language Steering for Multilingual In-Context Learning [10.932074928744568]
Large language models' performance on non-English languages remains substantially inferior to English.<n>We propose language vectors -- a training-free language steering approach.<n>We show consistent improvements on multilingual in-context learning over baselines across all tasks and languages tested.
arXiv Detail & Related papers (2026-02-02T16:52:09Z)
SteerEval: Inference-time Interventions Strengthen Multilingual Generalization in Neural Summarization Metrics [33.30877107523988]
A major empirical bottleneck in this area is the shortage of accurate and robust evaluation metrics for many languages.<n>Recent studies suggest that multilingual language models often use English as an internal pivot language.<n>Motivated by the hypothesis that this mismatch could also apply to multilingual neural metrics, we ask whether steering their activations toward an English pivot can improve correlation with human judgments.
arXiv Detail & Related papers (2026-01-22T09:49:29Z)
Investigating and Scaling up Code-Switching for Multilingual Language Model Pre-Training [58.696660064190475]
We find that the existence of code-switching, alternating between different languages within a context, is key to multilingual capabilities. To better explore the power of code-switching for language alignment during pre-training, we investigate the strategy of synthetic code-switching.
arXiv Detail & Related papers (2025-04-02T15:09:58Z)
AdaCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Chain-of-Thought [19.692743208974296]
We introduce AdaCoT (Adaptive Chain-of-Thought), a framework that enhances multilingual reasoning. AdaCoT dynamically routing thought processes through intermediary "thinking languages" before generating target-language responses.
arXiv Detail & Related papers (2025-01-27T15:48:57Z)
UniPSDA: Unsupervised Pseudo Semantic Data Augmentation for Zero-Shot Cross-Lingual Natural Language Understanding [31.272603877215733]
Cross-lingual representation learning transfers knowledge from resource-rich data to resource-scarce ones to improve the semantic understanding abilities of different languages. We propose an Unsupervised Pseudo Semantic Data Augmentation (UniPSDA) mechanism for cross-lingual natural language understanding to enrich the training data without human interventions.
arXiv Detail & Related papers (2024-06-24T07:27:01Z)
The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments [57.273662221547056]
In this study, we investigate an unintuitive novel driver of cross-lingual generalisation: language imbalance. We observe that the existence of a predominant language during training boosts the performance of less frequent languages. As we extend our analysis to real languages, we find that infrequent languages still benefit from frequent ones, yet whether language imbalance causes cross-lingual generalisation there is not conclusive.
arXiv Detail & Related papers (2024-04-11T17:58:05Z)
Are Structural Concepts Universal in Transformer Language Models? Towards Interpretable Cross-Lingual Generalization [27.368684663279463]
We investigate the potential for explicitly aligning conceptual correspondence between languages to enhance cross-lingual generalization. Using the syntactic aspect of language as a testbed, our analyses of 43 languages reveal a high degree of alignability. We propose a meta-learning-based method to learn to align conceptual spaces of different languages.
arXiv Detail & Related papers (2023-10-19T14:50:51Z)
Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process. We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks. Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z)
Cross-lingual Lifelong Learning [53.06904052325966]
We present a principled Cross-lingual Continual Learning (CCL) evaluation paradigm. We provide insights into what makes multilingual sequential learning particularly challenging. The implications of this analysis include a recipe for how to measure and balance different cross-lingual continual learning desiderata.
arXiv Detail & Related papers (2022-05-23T09:25:43Z)
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark. IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages. We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z)
Pre-training Universal Language Representation [46.51685959045527]
This work introduces universal language representation learning, i.e., embeddings of different levels of linguistic units or text with quite diverse lengths in a uniform vector space. We empirically verify that well designed pre-training scheme may effectively yield universal language representation.
arXiv Detail & Related papers (2021-05-30T09:29:01Z)
Improving Cross-Lingual Reading Comprehension with Self-Training [62.73937175625953]
Current state-of-the-art models even surpass human performance on several benchmarks. Previous works have revealed the abilities of pre-trained multilingual models for zero-shot cross-lingual reading comprehension. This paper further utilized unlabeled data to improve the performance.
arXiv Detail & Related papers (2021-05-08T08:04:30Z)
First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT [2.2931318723689276]
Cross-lingual transfer emerges from fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning. We show that multilingual BERT can be viewed as the stacking of two sub-networks: a multilingual encoder followed by a task-specific language-agnostic predictor. While the encoder is crucial for cross-lingual transfer and remains mostly unchanged during fine-tuning, the task predictor has little importance on the transfer and can be red during fine-tuning.
arXiv Detail & Related papers (2021-01-26T22:12:38Z)
On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics. Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings. We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.