AdaCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Chain-of-Thought
- URL: http://arxiv.org/abs/2501.16154v1
- Date: Mon, 27 Jan 2025 15:48:57 GMT
- Title: AdaCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Chain-of-Thought
- Authors: Xin Huang, Tarun Kumar Vangani, Zhengyuan Liu, Bowei Zou, Ai Ti Aw,
- Abstract summary: We introduce AdaCoT (Adaptive Chain-of-Thought), a framework that enhances multilingual reasoning.
AdaCoT dynamically routing thought processes through intermediary "thinking languages" before generating target-language responses.
- Score: 19.692743208974296
- License:
- Abstract: Large language models (LLMs) have shown impressive multilingual capabilities through pretraining on diverse corpora. While these models show strong reasoning abilities, their performance varies significantly across languages due to uneven training data distribution. Existing approaches using machine translation, and extensive multilingual pretraining and cross-lingual tuning face scalability challenges and often fail to capture nuanced reasoning processes across languages. In this paper, we introduce AdaCoT (Adaptive Chain-of-Thought), a framework that enhances multilingual reasoning by dynamically routing thought processes through intermediary "thinking languages" before generating target-language responses. AdaCoT leverages a language-agnostic core and incorporates an adaptive, reward-based mechanism for selecting optimal reasoning pathways without requiring additional pretraining. Our comprehensive evaluation across multiple benchmarks demonstrates substantial improvements in both factual reasoning quality and cross-lingual consistency, with particularly strong performance gains in low-resource language settings. The results suggest that adaptive reasoning paths can effectively bridge the performance gap between high and low-resource languages while maintaining cultural and linguistic nuances.
Related papers
- Demystifying Multilingual Chain-of-Thought in Process Reward Modeling [71.12193680015622]
We tackle the challenge of extending process reward models (PRMs) to multilingual settings.
We train multilingual PRMs on a dataset spanning seven languages, which is translated from English.
Our results highlight the sensitivity of multilingual PRMs to both the number of training languages and the volume of English data.
arXiv Detail & Related papers (2025-02-18T09:11:44Z) - LinguaLIFT: An Effective Two-stage Instruction Tuning Framework for Low-Resource Language Reasoning [28.288949710191158]
Large language models (LLMs) have exhibited impressive multilingual reasoning capabilities, driven by extensive multilingual pre-training corpora and instruction fine-tuning data.
A performance gap exists between high- and low-resource language reasoning tasks due to the language imbalance in the pre-training corpus.
We propose LinguaLIFT, a two-stage instruction tuning framework for advancing low-resource language reasoning.
arXiv Detail & Related papers (2024-12-17T03:03:17Z) - The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance.
Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes.
We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z) - No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement [59.37775534633868]
We introduce a novel method called language arithmetic, which enables training-free post-processing.
The effectiveness of the proposed solution is demonstrated on three downstream tasks in a MAD-X-based set of cross-lingual schemes.
arXiv Detail & Related papers (2024-04-24T08:52:40Z) - xCoT: Cross-lingual Instruction Tuning for Cross-lingual
Chain-of-Thought Reasoning [36.34986831526529]
Chain-of-thought (CoT) has emerged as a powerful technique to elicit reasoning in large language models.
We propose a cross-lingual instruction fine-tuning framework (xCOT) to transfer knowledge from high-resource languages to low-resource languages.
arXiv Detail & Related papers (2024-01-13T10:53:53Z) - MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization [65.31411639849516]
We propose a Multilingual-Alignment-as-Preference Optimization framework (MAPO) to align the reasoning processes in other languages with the dominant language.
Specifically, we harness an off-the-shelf translation model for the consistency between answers in non-dominant and dominant languages.
Experiments show that MAPO stably achieves significant improvements in the multilingual reasoning of various models.
arXiv Detail & Related papers (2024-01-12T18:03:54Z) - Empowering Multi-step Reasoning across Languages via Tree-of-Thoughts [1.8175282137722093]
Chain-of-Thought (CoT) methods empower Large Language Models (LLMs) to solve complex tasks in a step-by-step manner.
The ability to deliver multi-step reasoning remains limited to English because of the imbalance in the distribution of pre-training data.
We propose Cross-lingual Tree-of-Thoughts (Cross-ToT), a method for aligning Cross-lingual CoT reasoning across languages.
arXiv Detail & Related papers (2023-11-14T11:49:43Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z) - It's All in the Heads: Using Attention Heads as a Baseline for
Cross-Lingual Transfer in Commonsense Reasoning [4.200736775540874]
We design a simple approach to commonsense reasoning which trains a linear classifier with weights of multi-head attention as features.
The method performs competitively with recent supervised and unsupervised approaches for commonsense reasoning.
Most of the performance is given by the same small subset of attention heads for all studied languages.
arXiv Detail & Related papers (2021-06-22T21:25:43Z) - Adaptive Sparse Transformer for Multilingual Translation [18.017674093519332]
A known challenge of multilingual models is the negative language interference.
We propose an adaptive and sparse architecture for multilingual modeling.
Our model outperforms strong baselines in terms of translation quality without increasing the inference cost.
arXiv Detail & Related papers (2021-04-15T10:31:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.