Related papers: Scaling Test-time Compute for Low-resource Languages: Multilingual Reasoning in LLMs

Scaling Test-time Compute for Low-resource Languages: Multilingual Reasoning in LLMs

URL: http://arxiv.org/abs/2504.02890v1
Date: Wed, 02 Apr 2025 16:58:36 GMT
Title: Scaling Test-time Compute for Low-resource Languages: Multilingual Reasoning in LLMs
Authors: Khanh-Tung Tran, Barry O'Sullivan, Hoang D. Nguyen,
Abstract summary: We investigate the multilingual mechanism by which Large Language Models internally operate in a latent space biased toward their inherently dominant language.<n>We train models to generate the chain-of-thought (CoT) in English while outputting the final response in the target language, given input in the low-resource language.<n>Our experiments demonstrate that this approach, named English-Pivoted CoT Training, outperforms other baselines, with up to 28.33% improvement.
Score: 3.9530780161144667
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recent advances in test-time compute scaling have enabled Large Language Models (LLMs) to tackle deep reasoning tasks by generating a chain-of-thought (CoT) that includes trial and error, backtracking, and intermediate reasoning steps before producing the final answer. However, these techniques have been applied predominantly to popular languages, such as English, leaving reasoning in low-resource languages underexplored and misaligned. In this work, we investigate the multilingual mechanism by which LLMs internally operate in a latent space biased toward their inherently dominant language. To leverage this phenomenon for low-resource languages, we train models to generate the CoT in English while outputting the final response in the target language, given input in the low-resource language. Our experiments demonstrate that this approach, named English-Pivoted CoT Training, outperforms other baselines, including training to generate both the CoT and the final response solely in the target language, with up to 28.33% improvement. Further analysis provides novel insights into the relationships between reasoning and multilinguality of LLMs, prompting for better approaches in developing multilingual large reasoning models

Related papers

Demystifying Multilingual Chain-of-Thought in Process Reward Modeling [71.12193680015622]
We tackle the challenge of extending process reward models (PRMs) to multilingual settings. We train multilingual PRMs on a dataset spanning seven languages, which is translated from English. Our results highlight the sensitivity of multilingual PRMs to both the number of training languages and the volume of English data.
arXiv Detail & Related papers (2025-02-18T09:11:44Z)
Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging -- An Open Recipe [12.076338505539194]
This paper aims to enhance the reasoning capabilities of language-specific large language models (LLMs)<n>DeepSeek R1 excels in reasoning but primarily benefits high-resource languages such as English and Chinese.<n>Low-resource languages remain underserved due to the dominance of English-centric training data and model optimizations.
arXiv Detail & Related papers (2025-02-13T08:10:45Z)
LinguaLIFT: An Effective Two-stage Instruction Tuning Framework for Low-Resource Language Reasoning [28.288949710191158]
Large language models (LLMs) have exhibited impressive multilingual reasoning capabilities, driven by extensive multilingual pre-training corpora and instruction fine-tuning data. A performance gap exists between high- and low-resource language reasoning tasks due to the language imbalance in the pre-training corpus. We propose LinguaLIFT, a two-stage instruction tuning framework for advancing low-resource language reasoning.
arXiv Detail & Related papers (2024-12-17T03:03:17Z)
Multilingual LLMs Inherently Reward In-Language Time-Sensitive Semantic Alignment for Low-Resource Languages [19.863010475923414]
The disparity in labeled resources between resource-rich languages and those considered low-resource remains a significant impediment for Large Language Models (LLMs)<n>Recent strides in cross-lingual in-context learning (X-ICL), mainly through semantically aligned examples retrieved from multilingual pre-trained transformers, have shown promise in mitigating this issue.<n>This study aims to bridge this gap by improving temporal reasoning capabilities in low-resource languages.
arXiv Detail & Related papers (2024-12-11T04:16:39Z)
The Rise and Down of Babel Tower: Investigating the Evolution Process of Multilingual Code Large Language Model [59.357993924917]
We study the evolution of multilingual capabilities in large language models (LLMs) during the pre-training process. We propose the Babel Tower Hypothesis, which describes the entire process of LLMs acquiring new language capabilities. We propose a novel method to construct an optimized pre-training corpus for multilingual code LLMs.
arXiv Detail & Related papers (2024-12-10T08:28:57Z)
Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language.<n>Currently, instruction-tuned large language models (LLMs) excel at various English tasks.<n>Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z)
INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic Languages [25.402797722575805]
Indic QA Benchmark is a dataset for context grounded question answering in 11 major Indian languages. Evaluations revealed weak performance in low resource languages due to a strong English language bias in their training data. We also investigated the Translate Test paradigm,where inputs are translated to English for processing and the results are translated back into the source language for output.
arXiv Detail & Related papers (2024-07-18T13:57:16Z)
Understanding and Mitigating Language Confusion in LLMs [76.96033035093204]
We evaluate 15 typologically diverse languages with existing and newly-created English and multilingual prompts.<n>We find that Llama Instruct and Mistral models exhibit high degrees of language confusion.<n>We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning.
arXiv Detail & Related papers (2024-06-28T17:03:51Z)
mCoT: Multilingual Instruction Tuning for Reasoning Consistency in Language Models [21.616940026409818]
Large language models (LLMs) with Chain-of-thought (CoT) have recently emerged as a powerful technique for eliciting reasoning to improve downstream tasks. We study multilingual reasoning consistency across multiple languages, using popular open-source LLMs. We introduce multilingual CoT instruction tuning to boost reasoning capability across languages, thereby improving model consistency.
arXiv Detail & Related papers (2024-06-04T13:30:45Z)
Analyzing and Adapting Large Language Models for Few-Shot Multilingual NLU: Are We There Yet? [82.02076369811402]
Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and in-context learning (ICL) are three alternative, de facto standard approaches to few-shot learning. We present an extensive and systematic comparison of the three approaches, testing them on 6 high- and low-resource languages, three different NLU tasks, and a myriad of language and domain setups. Our observations show that supervised instruction tuning has the best trade-off between performance and resource requirements.
arXiv Detail & Related papers (2024-03-04T10:48:13Z)
Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts [75.33019401706188]
Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars. We propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English. Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages.
arXiv Detail & Related papers (2023-06-20T08:27:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.