MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization
- URL: http://arxiv.org/abs/2401.06838v3
- Date: Sat, 13 Apr 2024 18:27:04 GMT
- Title: MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization
- Authors: Shuaijie She, Wei Zou, Shujian Huang, Wenhao Zhu, Xiang Liu, Xiang Geng, Jiajun Chen,
- Abstract summary: We propose a Multilingual-Alignment-as-Preference Optimization framework (MAPO) to align the reasoning processes in other languages with the dominant language.
Specifically, we harness an off-the-shelf translation model for the consistency between answers in non-dominant and dominant languages.
Experiments show that MAPO stably achieves significant improvements in the multilingual reasoning of various models.
- Score: 65.31411639849516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Though reasoning abilities are considered language-agnostic, existing LLMs exhibit inconsistent reasoning abilities across different languages, e.g., reasoning in the dominant language like English is superior to other languages due to the imbalance of multilingual training data. To enhance reasoning abilities in non-dominant languages, we propose a Multilingual-Alignment-as-Preference Optimization framework (MAPO), aiming to align the reasoning processes in other languages with the dominant language. Specifically, we harness an off-the-shelf translation model for the consistency between answers in non-dominant and dominant languages, which we adopt as the preference for optimization, e.g., Direct Preference Optimization (DPO) or Proximal Policy Optimization (PPO). Experiments show that MAPO stably achieves significant improvements in the multilingual reasoning of various models on all three benchmarks (MSVAMP +16.2%, MGSM +6.1%, and MNumGLUESub +13.3%), with improved reasoning consistency across languages.
Related papers
- ShifCon: Enhancing Non-Dominant Language Capabilities with a Shift-based Contrastive Framework [79.72910257530795]
ShifCon is a Shift-based Contrastive framework that aligns the internal forward process of other languages toward that of the dominant one.
It shifts the representations of non-dominant languages into the dominant language subspace, allowing them to access relatively rich information encoded in the model parameters.
Experiments demonstrate that our ShifCon framework significantly enhances the performance of non-dominant languages.
arXiv Detail & Related papers (2024-10-25T10:28:59Z) - Language Imbalance Driven Rewarding for Multilingual Self-improving [35.1576728251478]
Large Language Models (LLMs) have achieved state-of-the-art performance across numerous tasks.
This imbalance, while limiting broader applications, generates a natural preference ranking between languages.
We propose $textitLanguage Imbalance Driven Rewarding$, where the inherent imbalance between dominant and non-dominant languages is leveraged as a reward signal.
arXiv Detail & Related papers (2024-10-11T16:32:05Z) - X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale [25.257770733168012]
Large language models (LLMs) have achieved remarkable success across various NLP tasks, yet their focus has predominantly been on English.
In this paper, we prioritize quality over scaling number of languages, with a focus on multilingual machine translation task.
X-ALMA is a model designed with a commitment to ensuring top-tier performance across 50 diverse languages, regardless of their resource levels.
arXiv Detail & Related papers (2024-10-04T03:17:27Z) - Preference Tuning For Toxicity Mitigation Generalizes Across Languages [17.784213168942117]
This work explores zero-shot cross-lingual generalization of preference tuning in multilingual Large Language Models.
We demonstrate that Direct Preference Optimization training with only English data can significantly reduce toxicity in multilingual open-ended generations.
arXiv Detail & Related papers (2024-06-23T22:53:47Z) - mCoT: Multilingual Instruction Tuning for Reasoning Consistency in Language Models [21.616940026409818]
Large language models (LLMs) with Chain-of-thought (CoT) have recently emerged as a powerful technique for eliciting reasoning to improve downstream tasks.
We study multilingual reasoning consistency across multiple languages, using popular open-source LLMs.
We introduce multilingual CoT instruction tuning to boost reasoning capability across languages, thereby improving model consistency.
arXiv Detail & Related papers (2024-06-04T13:30:45Z) - Analyzing and Adapting Large Language Models for Few-Shot Multilingual
NLU: Are We There Yet? [82.02076369811402]
Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and in-context learning (ICL) are three alternative, de facto standard approaches to few-shot learning.
We present an extensive and systematic comparison of the three approaches, testing them on 6 high- and low-resource languages, three different NLU tasks, and a myriad of language and domain setups.
Our observations show that supervised instruction tuning has the best trade-off between performance and resource requirements.
arXiv Detail & Related papers (2024-03-04T10:48:13Z) - Unintended Impacts of LLM Alignment on Global Representation [62.6579934112071]
We show that developers align Large Language Models (LLMs) to user preferences through a variety of procedures, such as Reinforcement Learning From Human Feedback (RLHF) and Direct Preference Optimization (DPO)
We explore how alignment impacts performance along three axes of global representation: English dialects, multilingualism, and opinions from and about countries worldwide.
We conclude by discussing design decisions that led to these unintended impacts and recommendations for more equitable preference tuning.
arXiv Detail & Related papers (2024-02-22T23:31:22Z) - Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization.
We show how to practically optimize this objective for large translation corpora using an iterated best response scheme.
Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z) - Gradient Vaccine: Investigating and Improving Multi-task Optimization in
Massively Multilingual Models [63.92643612630657]
This paper attempts to peek into the black-box of multilingual optimization through the lens of loss function geometry.
We find that gradient similarity measured along the optimization trajectory is an important signal, which correlates well with language proximity.
We derive a simple and scalable optimization procedure, named Gradient Vaccine, which encourages more geometrically aligned parameter updates for close tasks.
arXiv Detail & Related papers (2020-10-12T17:26:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.