Breaking the Language Barrier: Improving Cross-Lingual Reasoning with
Structured Self-Attention
- URL: http://arxiv.org/abs/2310.15258v1
- Date: Mon, 23 Oct 2023 18:06:38 GMT
- Title: Breaking the Language Barrier: Improving Cross-Lingual Reasoning with
Structured Self-Attention
- Authors: Negar Foroutan, Mohammadreza Banaei, Karl Aberer, Antoine Bosselut
- Abstract summary: We study whether multilingual language models (MultiLMs) can transfer logical reasoning abilities to other languages when they are fine-tuned for reasoning in a different language.
We demonstrate that although MultiLMs can transfer reasoning ability across languages in a monolingual setting, they struggle to transfer reasoning abilities in a code-switched setting.
Following this observation, we propose a novel attention mechanism that uses a dedicated set of parameters to encourage cross-lingual attention in code-switched sequences.
- Score: 18.439771003766026
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we study whether multilingual language models (MultiLMs) can
transfer logical reasoning abilities to other languages when they are
fine-tuned for reasoning in a different language. We evaluate the cross-lingual
reasoning abilities of MultiLMs in two schemes: (1) where the language of the
context and the question remain the same in the new languages that are tested
(i.e., the reasoning is still monolingual, but the model must transfer the
learned reasoning ability across languages), and (2) where the language of the
context and the question is different (which we term code-switched reasoning).
On two logical reasoning datasets, RuleTaker and LeapOfThought, we demonstrate
that although MultiLMs can transfer reasoning ability across languages in a
monolingual setting, they struggle to transfer reasoning abilities in a
code-switched setting. Following this observation, we propose a novel attention
mechanism that uses a dedicated set of parameters to encourage cross-lingual
attention in code-switched sequences, which improves the reasoning performance
by up to 14% and 4% on the RuleTaker and LeapOfThought datasets, respectively.
Related papers
- Understanding and Mitigating Language Confusion in LLMs [76.96033035093204]
We evaluate 15 typologically diverse languages with existing and newly-created English and multilingual prompts.
We find that Llama Instruct and Mistral models exhibit high degrees of language confusion.
We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning.
arXiv Detail & Related papers (2024-06-28T17:03:51Z) - Large Language Models Are Cross-Lingual Knowledge-Free Reasoners [43.99097308487008]
We decompose the process of reasoning tasks into two separated components: knowledge retrieval and knowledge-free reasoning.
We show that the knowledge-free reasoning capability can be nearly perfectly transferred across various source-target language directions.
We hypothesize that knowledge-free reasoning shares similar neurons in different languages for reasoning, while knowledge is stored separately in different languages.
arXiv Detail & Related papers (2024-06-24T14:03:04Z) - LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks.
But, can they really "reason" over the natural language?
This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z) - The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments [57.273662221547056]
In this study, we investigate an unintuitive novel driver of cross-lingual generalisation: language imbalance.
We observe that the existence of a predominant language during training boosts the performance of less frequent languages.
As we extend our analysis to real languages, we find that infrequent languages still benefit from frequent ones, yet whether language imbalance causes cross-lingual generalisation there is not conclusive.
arXiv Detail & Related papers (2024-04-11T17:58:05Z) - Eliciting Better Multilingual Structured Reasoning from LLMs through Code [17.870002864331322]
We introduce a multilingual structured reasoning and explanation dataset, termed xSTREET, that covers four tasks across six languages.
xSTREET exposes a gap in base LLM performance between English and non-English reasoning tasks.
We propose two methods to remedy this gap, building on the insight that LLMs trained on code are better reasoners.
arXiv Detail & Related papers (2024-03-05T00:48:56Z) - LangBridge: Multilingual Reasoning Without Multilingual Supervision [43.67596732997818]
LangBridge is a zero-shot approach to adapt language models for multilingual reasoning tasks without multilingual supervision.
LangBridge connects two models by introducing minimal trainable parameters between them.
Our analysis suggests that the efficacy of LangBridge stems from the language-agnostic characteristics of multilingual representations.
arXiv Detail & Related papers (2024-01-19T14:00:19Z) - Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed? [40.13166574854085]
We investigate the minimal amount of multilinguality required to elicit cross-lingual generalisation in English-centric large language models.
We find that multilingual instruction tuning with as few as two to three languages is both necessary and sufficient to elicit effective cross-lingual generalisation.
arXiv Detail & Related papers (2023-12-20T00:49:52Z) - Large Language Models are In-Context Semantic Reasoners rather than
Symbolic Reasoners [75.85554779782048]
Large Language Models (LLMs) have excited the natural language and machine learning community over recent years.
Despite of numerous successful applications, the underlying mechanism of such in-context capabilities still remains unclear.
In this work, we hypothesize that the learned textitsemantics of language tokens do the most heavy lifting during the reasoning process.
arXiv Detail & Related papers (2023-05-24T07:33:34Z) - Cross-Lingual Ability of Multilingual Masked Language Models: A Study of
Language Structure [54.01613740115601]
We study three language properties: constituent order, composition and word co-occurrence.
Our main conclusion is that the contribution of constituent order and word co-occurrence is limited, while the composition is more crucial to the success of cross-linguistic transfer.
arXiv Detail & Related papers (2022-03-16T07:09:35Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.