LLaMAX2: Your Translation-Enhanced Model also Performs Well in Reasoning
- URL: http://arxiv.org/abs/2510.09189v1
- Date: Fri, 10 Oct 2025 09:33:28 GMT
- Title: LLaMAX2: Your Translation-Enhanced Model also Performs Well in Reasoning
- Authors: Changjiang Gao, Zixian Huang, Jingyang Gong, Shujian Huang, Lei Li, Fei Yuan,
- Abstract summary: General Large Language Models excel in reasoning, but those enhanced for translation struggle with reasoning tasks.<n>We propose a novel translationenhanced recipe that begins with instruct models and applies layer-selective tuning only on parallel data.<n>We introduce the Qwen3-XPlus models, which demonstrate significant improvements in translation performance across both high- and lowresource languages.
- Score: 39.84745746949007
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: General Large Language Models (LLMs) excel in reasoning, but those enhanced for translation struggle with reasoning tasks. To address this, we propose a novel translationenhanced recipe that begins with instruct models and applies layer-selective tuning only on parallel data. Following this pipeline, we introduce the Qwen3-XPlus models, which demonstrate significant improvements in translation performance across both high- and lowresource languages, achieving 15+ spBLEU and 40+ xComet in low-resource languages, like Swahili. Interestingly, training only with small parallel datasets, Qwen3-XPlus achieves an average improvement of 1+ points on 7 multilingual tasks while maintaining proficiency comparable to the Qwen3 instruct model in 15 popular reasoning datasets. This work offers a promising approach to multilingual enhancement, significantly reducing complexity and enhancing accessibility for a wider range of languages. The code and model are publicly available.
Related papers
- Bootstrapping Embeddings for Low Resource Languages [0.6754597324022876]
Embedding models are crucial to modern NLP.<n>For high resource languages, such as English, such datasets are readily available.<n>For hundreds of other languages, they are simply non-existent.
arXiv Detail & Related papers (2026-03-02T10:59:33Z) - Aligning Multilingual Reasoning with Verifiable Semantics from a High-Resource Expert Model [13.788758077632432]
We introduce Pivot-Based Reinforcement Learning with Semantically Verifiable Rewards.<n>This framework enhances multilingual reasoning by circumventing the need for human-annotated data in target languages.<n>We show that our method significantly narrows the performance gap between English and other languages.
arXiv Detail & Related papers (2025-09-29T22:03:11Z) - Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters [53.59868121093848]
We introduce Seed-X, a family of open-source language models (LLMs) with 7B parameter size.<n>The base model is pre-trained on a diverse, high-quality dataset encompassing both monolingual and bilingual content across 28 languages.<n>The instruct model is then finetuned to translate by Chain-of-Thought (CoT) reasoning and further enhanced through reinforcement learning (RL) to achieve better generalization across diverse language pairs.
arXiv Detail & Related papers (2025-07-18T03:19:43Z) - Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models [90.54780244175511]
We introduce the Qwen3 Embedding series, a significant advancement over its predecessor, the GTE-Qwen series.<n>The Qwen3 Embedding series offers a spectrum of model sizes for both embedding and reranking tasks.<n>The Qwen3 Embedding series achieves state-of-the-art results across diverse benchmarks.
arXiv Detail & Related papers (2025-06-05T15:49:48Z) - Pretraining Language Models to Ponder in Continuous Space [50.52734567589996]
We introduce this pondering process into language models by repeatedly invoking the forward process within a single token generation step.<n>We show that the model can learn to ponder in this way through self-supervised learning, without any human annotations.
arXiv Detail & Related papers (2025-05-27T03:47:33Z) - Qwen3 Technical Report [137.96804244102205]
We present Qwen3, the latest version of the Qwen model family.<n>Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities.
arXiv Detail & Related papers (2025-05-14T13:41:34Z) - Benchmarking the Performance of Pre-trained LLMs across Urdu NLP Tasks [0.9786690381850356]
This study presents in-depth examination of 7 prominent Large Language Models (LLMs) across 17 tasks using 22 datasets, 13.8 hours of speech, in a zero-shot setting, and their performance against state-of-the-art (SOTA) models.<n>Our results emphasize that models with fewer parameters but richer language-specific data, like Llama 3.1-8B, often outperform larger models with lower language diversity, such as GPT-3.5, in several tasks.
arXiv Detail & Related papers (2024-05-24T11:30:37Z) - Question Translation Training for Better Multilingual Reasoning [108.10066378240879]
Large language models show compelling performance on reasoning tasks but they tend to perform much worse in languages other than English.
A typical solution is to translate instruction data into all languages of interest, and then train on the resulting multilingual data, which is called translate-training.
In this paper we explore the benefits of question alignment, where we train the model to translate reasoning questions into English by finetuning on X-English parallel question data.
arXiv Detail & Related papers (2024-01-15T16:39:10Z) - Document-Level Language Models for Machine Translation [37.106125892770315]
We build context-aware translation systems utilizing document-level monolingual data instead.
We improve existing approaches by leveraging recent advancements in model combination.
In most scenarios, back-translation gives even better results, at the cost of having to re-train the translation system.
arXiv Detail & Related papers (2023-10-18T20:10:07Z) - Bactrian-X: Multilingual Replicable Instruction-Following Models with
Low-Rank Adaptation [40.695782736177264]
Bactrian-X is a comprehensive multilingual parallel dataset of 3.4 million instruction-response pairs across 52 languages.
We train a set of adapters using low-rank adaptation (LoRA), which are lightweight components that seamlessly integrate with large language models.
Experiments in various multilingual evaluation settings demonstrate that models derived from LoRA-based training over Bactrian-X outperform both the vanilla models and existing instruction-tuned models.
arXiv Detail & Related papers (2023-05-24T10:50:31Z) - Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language.
We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models.
Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.