Beyond English-Centric Training: How Reinforcement Learning Improves Cross-Lingual Reasoning in LLMs
- URL: http://arxiv.org/abs/2509.23657v1
- Date: Sun, 28 Sep 2025 05:48:39 GMT
- Title: Beyond English-Centric Training: How Reinforcement Learning Improves Cross-Lingual Reasoning in LLMs
- Authors: Shulin Huang, Yiran Ding, Junshu Pan, Yue Zhang,
- Abstract summary: We present the first systematic investigation into cross-lingual reasoning generalization of reinforcement learning (RL) andSupervised Fine-Tuning (SFT)<n>Our investigation yields two significant findings: (1) Tuning with RL achieves higher accuracy but also demonstrates substantially stronger cross-lingual generalization capabilities compared to SFT.
- Score: 8.908696346867119
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Enhancing the complex reasoning capabilities of Large Language Models (LLMs) attracts widespread attention. While reinforcement learning (RL) has shown superior performance for improving complex reasoning, its impact on cross-lingual generalization compared to Supervised Fine-Tuning (SFT) remains unexplored. We present the first systematic investigation into cross-lingual reasoning generalization of RL and SFT. Using Qwen2.5-3B-Base as our foundation model, we conduct experiments on diverse multilingual reasoning benchmarks, including math reasoning, commonsense reasoning, and scientific reasoning. Our investigation yields two significant findings: (1) Tuning with RL not only achieves higher accuracy but also demonstrates substantially stronger cross-lingual generalization capabilities compared to SFT. (2) RL training on non-English data yields better overall performance and generalization than training on English data, which is not observed with SFT. Furthermore, through comprehensive mechanistic analyses, we explore the underlying factors of RL's superiority and generalization across languages. Our results provide compelling evidence that RL enables the model with more robust reasoning strategies, offering crucial guidance for more equitable and effective multilingual reasoning.
Related papers
- ExpLang: Improved Exploration and Exploitation in LLM Reasoning with On-Policy Thinking Language Selection [39.813397419564936]
We propose ExpLang, a novel post-training pipeline that enables on-policy thinking language selection to improve exploration and exploitation during reinforcement learning.<n>We show that our method steadily outperforms English-only training with the same training budget, while showing high thinking language compliance for both seen and unseen languages.
arXiv Detail & Related papers (2026-02-25T13:10:58Z) - Tailored Primitive Initialization is the Secret Key to Reinforcement Learning [61.29280885291581]
Reinforcement learning (RL) has emerged as a powerful paradigm for enhancing the reasoning capabilities of large language models (LLMs)<n>We argue that initializing LLMs with diverse, high-quality reasoning primitives is essential for achieving stable and sample-efficient RL training.<n>We propose Tailor, a finetuning pipeline that automatically discovers and curates novel reasoning primitives.
arXiv Detail & Related papers (2025-11-16T03:12:40Z) - Parallel Scaling Law: Unveiling Reasoning Generalization through A Cross-Linguistic Perspective [52.452449102961225]
This study proposes a novel cross-linguistic perspective to investigate reasoning generalization.<n>Our findings reveal that cross-lingual transferability varies significantly across initial model, target language, and training paradigm.<n>Our study challenges the assumption that LRM reasoning mirrors human cognition, providing critical insights for the development of more language-agnostic LRMs.
arXiv Detail & Related papers (2025-10-02T17:49:49Z) - Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning [82.43575191712726]
We introduce a fine-grained analytic framework to dissect the impact ofReinforcement learning on reasoning.<n>Our framework specifically investigates key elements that have been hypothesized to benefit from RL training.
arXiv Detail & Related papers (2025-06-05T07:53:59Z) - Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback [59.078756231841574]
Critique-GRPO is an online RL framework that integrates both natural language and numerical feedback for effective policy optimization.<n>We show Critique-GRPO consistently outperforms supervised learning and RL-based fine-tuning methods across eight challenging mathematical, STEM, and general reasoning tasks.
arXiv Detail & Related papers (2025-06-03T17:39:02Z) - The Hallucination Dilemma: Factuality-Aware Reinforcement Learning for Large Reasoning Models [63.98194996746229]
Large language models (LLMs) have significantly advanced in reasoning tasks through reinforcement learning (RL) optimization.<n>However, reasoning-oriented RL fine-tuning significantly increases the prevalence of hallucinations.<n>We propose Factuality-aware Step-wise Policy Optimization (FSPO), an innovative RL fine-tuning algorithm incorporating explicit factuality verification.
arXiv Detail & Related papers (2025-05-30T14:23:32Z) - Interleaved Reasoning for Large Language Models via Reinforcement Learning [22.403928213802036]
Long chain-of-thought (CoT) enhances large language models' (LLM) reasoning capabilities.<n>We propose a novel training paradigm that uses reinforcement learning (RL) to guide reasoning LLMs to interleave thinking and answering for multi-hop questions.
arXiv Detail & Related papers (2025-05-26T07:58:17Z) - Behavior Injection: Preparing Language Models for Reinforcement Learning [45.744838898763554]
We analyze the per-step influence of the RL objective and identify two key conditions for effective post-training.<n>We propose behavior injection, a task-agnostic data augmentation scheme applied prior to RL.<n>We evaluate our method across two reasoning benchmarks with multiple base models.
arXiv Detail & Related papers (2025-05-25T00:54:50Z) - Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 [53.894789613838654]
We introduce SEED-Bench-R1, a benchmark designed to evaluate post-training methods for MLLMs in video understanding.<n>It includes intricate real-world videos and complex everyday planning tasks in the format of multiple-choice questions.<n>Using Qwen2-VL-Instruct-7B as a base model, we compare RL with supervised fine-tuning (SFT)<n>Our detailed analysis reveals that RL enhances visual perception but often produces less coherent reasoning chains.
arXiv Detail & Related papers (2025-03-31T17:55:23Z) - T1: Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling [52.34735382627312]
Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks.<n>Existing approaches mainly rely on imitation learning and struggle to achieve effective test-time scaling.<n>We present T1 to scale reinforcement learning by encouraging exploration and understand inference scaling.
arXiv Detail & Related papers (2025-01-20T18:33:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.