Related papers: CM-Align: Consistency-based Multilingual Alignment for Large Language Models

CM-Align: Consistency-based Multilingual Alignment for Large Language Models

URL: http://arxiv.org/abs/2509.08541v2
Date: Mon, 15 Sep 2025 06:55:00 GMT
Title: CM-Align: Consistency-based Multilingual Alignment for Large Language Models
Authors: Xue Zhang, Yunlong Liang, Fandong Meng, Songming Zhang, Yufeng Chen, Jinan Xu, Jie Zhou,
Abstract summary: We propose a consistency-based data method to construct high-quality multilingual preference data.<n> Specifically, our method includes two parts: consistency-guided English reference selection and cross-lingual consistency-based multilingual preference data construction.
Score: 84.19366314925593
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current large language models (LLMs) generally show a significant performance gap in alignment between English and other languages. To bridge this gap, existing research typically leverages the model's responses in English as a reference to select the best/worst responses in other languages, which are then used for Direct Preference Optimization (DPO) training. However, we argue that there are two limitations in the current methods that result in noisy multilingual preference data and further limited alignment performance: 1) Not all English responses are of high quality, and using a response with low quality may mislead the alignment for other languages. 2) Current methods usually use biased or heuristic approaches to construct multilingual preference pairs. To address these limitations, we design a consistency-based data selection method to construct high-quality multilingual preference data for improving multilingual alignment (CM-Align). Specifically, our method includes two parts: consistency-guided English reference selection and cross-lingual consistency-based multilingual preference data construction. Experimental results on three LLMs and three common tasks demonstrate the effectiveness and superiority of our method, which further indicates the necessity of constructing high-quality preference data.

Related papers

LangGPS: Language Separability Guided Data Pre-Selection for Joint Multilingual Instruction Tuning [49.22807995935406]
Joint multilingual instruction tuning is a widely adopted approach to improve the multilingual instruction-following ability and downstream performance of large language models (LLMs)<n>Existing selection methods, often based on features like text quality, diversity, or task relevance, typically overlook the intrinsic linguistic structure of multilingual data.<n>We propose LangGPS, a lightweight two-stage pre-selection framework guided by language separability which quantifies how well samples in different languages can be distinguished in the model's representation space.
arXiv Detail & Related papers (2025-11-13T12:02:32Z)
Exploring Polyglot Harmony: On Multilingual Data Allocation for Large Language Models Pretraining [16.590296049892576]
This paper introduces Climb, a novel framework designed to systematically optimize multilingual data allocation.<n>At its core, Climb introduces a cross-lingual interaction-aware language ratio, explicitly quantifying each language's effective allocation by capturing inter-language dependencies.<n>Extensive experiments confirm that Climb can accurately measure cross-lingual interactions across various multilingual settings.
arXiv Detail & Related papers (2025-09-19T03:34:34Z)
A method for improving multilingual quality and diversity of instruction fine-tuning datasets [29.07537849245622]
We introduce Multilingual Data Quality and Diversity (M-DaQ) to improve Multilingual Instruction Fine-Tuning (IFT)<n>M-DaQ is a novel method for improving LLMs multilinguality by selecting high-quality and semantically diverse multilingual IFT samples.<n> Empirical results across 18 languages demonstrate that models fine-tuned with M-DaQ achieve significant performance gains over vanilla baselines over 60% win rate.
arXiv Detail & Related papers (2025-09-19T03:07:59Z)
MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining [27.952041404675846]
We introduce MuRating, a framework that transfers high-quality English data-quality signals into a single rater for 17 target languages.<n>MuRating aggregates multiple English "raters" via pairwise comparisons to learn unified document-quality scores.<n>It then projects these judgments through translation to train a multilingual evaluator on monolingual, cross-lingual, and parallel text pairs.
arXiv Detail & Related papers (2025-07-02T15:11:12Z)
Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models [52.22235443948351]
High-quality multilingual training data is essential for effectively pretraining large language models (LLMs)<n>Here, we introduce JQL, a systematic approach that efficiently curates diverse and high-quality multilingual data at scale.<n>JQL distills LLMs' annotation capabilities into lightweight annotators based on pretrained multilingual embeddings.
arXiv Detail & Related papers (2025-05-28T11:06:54Z)
Implicit Cross-Lingual Rewarding for Efficient Multilingual Preference Alignment [35.1576728251478]
We propose a novel approach that captures preferences from well-aligned English models by implicit rewards and transfers them to other languages through iterative training.<n>Fine-tuning Llama3 for two iterations resulted in a 12.72% average improvement in Win Rate and a 5.97% increase in Length Control Win Rate across all training languages on the X-AlpacaEval leaderboard.
arXiv Detail & Related papers (2025-03-06T17:33:01Z)
Balanced Multi-Factor In-Context Learning for Multilingual Large Language Models [53.38288894305388]
Multilingual large language models (MLLMs) are able to leverage in-context learning (ICL) to achieve high performance by leveraging cross-lingual knowledge transfer without parameter updates.<n>Three key factors influence multilingual ICL: (1) semantic similarity, (2) linguistic alignment, and (3) language-specific performance.<n>We propose balanced multi-factor ICL (textbfBMF-ICL), a method that quantifies and optimally balances these factors for improved example selection.
arXiv Detail & Related papers (2025-02-17T06:56:33Z)
MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization [65.31411639849516]
We propose a Multilingual-Alignment-as-Preference Optimization framework (MAPO) to align the reasoning processes in other languages with the dominant language. Specifically, we harness an off-the-shelf translation model for the consistency between answers in non-dominant and dominant languages. Experiments show that MAPO stably achieves significant improvements in the multilingual reasoning of various models.
arXiv Detail & Related papers (2024-01-12T18:03:54Z)
Sample Efficient Preference Alignment in LLMs via Active Exploration [63.84454768573154]
We take advantage of the fact that one can often choose contexts at which to obtain human feedback to most efficiently identify a good policy.<n>We propose an active exploration algorithm to efficiently select the data and provide theoretical proof that it has a worst-case regret bound.<n>Our method outperforms the baselines with limited samples of human preferences on several language models and four real-world datasets.
arXiv Detail & Related papers (2023-12-01T00:54:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.