MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization
- URL: http://arxiv.org/abs/2401.06838v3
- Date: Sat, 13 Apr 2024 18:27:04 GMT
- Title: MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization
- Authors: Shuaijie She, Wei Zou, Shujian Huang, Wenhao Zhu, Xiang Liu, Xiang Geng, Jiajun Chen,
- Abstract summary: We propose a Multilingual-Alignment-as-Preference Optimization framework (MAPO) to align the reasoning processes in other languages with the dominant language.
Specifically, we harness an off-the-shelf translation model for the consistency between answers in non-dominant and dominant languages.
Experiments show that MAPO stably achieves significant improvements in the multilingual reasoning of various models.
- Score: 65.31411639849516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Though reasoning abilities are considered language-agnostic, existing LLMs exhibit inconsistent reasoning abilities across different languages, e.g., reasoning in the dominant language like English is superior to other languages due to the imbalance of multilingual training data. To enhance reasoning abilities in non-dominant languages, we propose a Multilingual-Alignment-as-Preference Optimization framework (MAPO), aiming to align the reasoning processes in other languages with the dominant language. Specifically, we harness an off-the-shelf translation model for the consistency between answers in non-dominant and dominant languages, which we adopt as the preference for optimization, e.g., Direct Preference Optimization (DPO) or Proximal Policy Optimization (PPO). Experiments show that MAPO stably achieves significant improvements in the multilingual reasoning of various models on all three benchmarks (MSVAMP +16.2%, MGSM +6.1%, and MNumGLUESub +13.3%), with improved reasoning consistency across languages.
Related papers
- Balanced Multi-Factor In-Context Learning for Multilingual Large Language Models [53.38288894305388]
Multilingual large language models (MLLMs) are able to leverage in-context learning (ICL) to achieve high performance by leveraging cross-lingual knowledge transfer without parameter updates.
Three key factors influence multilingual ICL: (1) semantic similarity, (2) linguistic alignment, and (3) language-specific performance.
We propose balanced multi-factor ICL (textbfBMF-ICL), a method that quantifies and optimally balances these factors for improved example selection.
arXiv Detail & Related papers (2025-02-17T06:56:33Z) - The Multilingual Mind : A Survey of Multilingual Reasoning in Language Models [18.399229357408043]
Multilingual reasoning requires language models to handle logical reasoning across languages.
This survey provides the first in-depth review of multilingual reasoning in Language Models.
arXiv Detail & Related papers (2025-02-13T16:25:16Z) - AdaCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Chain-of-Thought [19.692743208974296]
We introduce AdaCoT (Adaptive Chain-of-Thought), a framework that enhances multilingual reasoning.
AdaCoT dynamically routing thought processes through intermediary "thinking languages" before generating target-language responses.
arXiv Detail & Related papers (2025-01-27T15:48:57Z) - ShifCon: Enhancing Non-Dominant Language Capabilities with a Shift-based Contrastive Framework [78.07201802874529]
ShifCon is a Shift-based Contrastive framework that aligns the internal forward process of other languages toward that of the dominant one.
It shifts the representations of non-dominant languages into the dominant language subspace, allowing them to access relatively rich information encoded in the model parameters.
Experiments demonstrate that our ShifCon framework significantly enhances the performance of non-dominant languages.
arXiv Detail & Related papers (2024-10-25T10:28:59Z) - Language Imbalance Driven Rewarding for Multilingual Self-improving [35.1576728251478]
Large Language Models (LLMs) have achieved state-of-the-art performance across numerous tasks.
This imbalance, while limiting broader applications, generates a natural preference ranking between languages.
We propose $textitLanguage Imbalance Driven Rewarding$, where the inherent imbalance between dominant and non-dominant languages is leveraged as a reward signal.
arXiv Detail & Related papers (2024-10-11T16:32:05Z) - Preference Tuning For Toxicity Mitigation Generalizes Across Languages [17.784213168942117]
This work explores zero-shot cross-lingual generalization of preference tuning in multilingual Large Language Models.
We demonstrate that Direct Preference Optimization training with only English data can significantly reduce toxicity in multilingual open-ended generations.
arXiv Detail & Related papers (2024-06-23T22:53:47Z) - Unintended Impacts of LLM Alignment on Global Representation [62.6579934112071]
We show that developers align Large Language Models (LLMs) to user preferences through a variety of procedures, such as Reinforcement Learning From Human Feedback (RLHF) and Direct Preference Optimization (DPO)
We explore how alignment impacts performance along three axes of global representation: English dialects, multilingualism, and opinions from and about countries worldwide.
We conclude by discussing design decisions that led to these unintended impacts and recommendations for more equitable preference tuning.
arXiv Detail & Related papers (2024-02-22T23:31:22Z) - Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization.
We show how to practically optimize this objective for large translation corpora using an iterated best response scheme.
Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z) - Gradient Vaccine: Investigating and Improving Multi-task Optimization in
Massively Multilingual Models [63.92643612630657]
This paper attempts to peek into the black-box of multilingual optimization through the lens of loss function geometry.
We find that gradient similarity measured along the optimization trajectory is an important signal, which correlates well with language proximity.
We derive a simple and scalable optimization procedure, named Gradient Vaccine, which encourages more geometrically aligned parameter updates for close tasks.
arXiv Detail & Related papers (2020-10-12T17:26:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.