Related papers: Do LLMs Need Inherent Reasoning Before Reinforcement Learning? A Study in Korean Self-Correction

Do LLMs Need Inherent Reasoning Before Reinforcement Learning? A Study in Korean Self-Correction

URL: http://arxiv.org/abs/2601.05459v1
Date: Fri, 09 Jan 2026 01:17:31 GMT
Title: Do LLMs Need Inherent Reasoning Before Reinforcement Learning? A Study in Korean Self-Correction
Authors: Hongjin Kim, Jaewook Lee, Kiyoung Lee, Jong-hun Shin, Soojong Lim, Oh-Woog Kwon,
Abstract summary: We investigate whether reinforcement learning can enhance Korean reasoning abilities to a degree comparable to English.<n>Our findings reveal that RL alone yields limited improvements when applied to models lacking inherent Korean reasoning capabilities.<n>We show that aligning the model's internal reasoning processes with Korean inputs-particularly by tuning Korean-specific neurons in early layers-is key to unlocking RL's effectiveness.
Score: 7.756650000650388
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) demonstrate strong reasoning and self-correction abilities in high-resource languages like English, but their performance remains limited in low-resource languages such as Korean. In this study, we investigate whether reinforcement learning (RL) can enhance Korean reasoning abilities to a degree comparable to English. Our findings reveal that RL alone yields limited improvements when applied to models lacking inherent Korean reasoning capabilities. To address this, we explore several fine-tuning strategies and show that aligning the model's internal reasoning processes with Korean inputs-particularly by tuning Korean-specific neurons in early layers-is key to unlocking RL's effectiveness. We introduce a self-correction code-switching dataset to facilitate this alignment and observe significant performance gains in both mathematical reasoning and self-correction tasks. Ultimately, we conclude that the crucial factor in multilingual reasoning enhancement is not injecting new linguistic knowledge, but effectively eliciting and aligning existing reasoning capabilities. Our study provides a new perspective on how internal translation and neuron-level tuning contribute to multilingual reasoning alignment in LLMs.

Related papers

ExpLang: Improved Exploration and Exploitation in LLM Reasoning with On-Policy Thinking Language Selection [39.813397419564936]
We propose ExpLang, a novel post-training pipeline that enables on-policy thinking language selection to improve exploration and exploitation during reinforcement learning.<n>We show that our method steadily outperforms English-only training with the same training budget, while showing high thinking language compliance for both seen and unseen languages.
arXiv Detail & Related papers (2026-02-25T13:10:58Z)
Align to the Pivot: Dual Alignment with Self-Feedback for Multilingual Math Reasoning [71.4175109189942]
We present Pivot-Aligned Self-Feedback Multilingual Reasoning (PASMR)<n>This approach designates the model's primary language as the pivot language.<n>It establishes a cross-lingual self-feedback mechanism without relying on external correct answers or reward models.
arXiv Detail & Related papers (2026-01-25T03:20:00Z)
Code-Switching In-Context Learning for Cross-Lingual Transfer of Large Language Models [64.54005959758733]
We introduce code-switching in-context learning (CSICL) as a principled and robust approach for overcoming the translation barrier during inference.<n>We conduct extensive experiments across 4 LLMs, 6 datasets, and 10 languages, spanning both knowledge-intensive and reasoning-oriented domains.<n>Our results demonstrate CSICL consistently outperforms X-ICL baselines, achieving gains of 3.1%p and 1.9%p in both target and unseen languages.
arXiv Detail & Related papers (2025-10-07T08:35:42Z)
Parallel Scaling Law: Unveiling Reasoning Generalization through A Cross-Linguistic Perspective [52.452449102961225]
This study proposes a novel cross-linguistic perspective to investigate reasoning generalization.<n>Our findings reveal that cross-lingual transferability varies significantly across initial model, target language, and training paradigm.<n>Our study challenges the assumption that LRM reasoning mirrors human cognition, providing critical insights for the development of more language-agnostic LRMs.
arXiv Detail & Related papers (2025-10-02T17:49:49Z)
Making Qwen3 Think in Korean with Reinforcement Learning [5.237306053045462]
We present a two-stage fine-tuning approach to make the large language model Qwen3 14B "think" in Korean.<n>In the first stage, supervised fine-tuning (SFT) on a high-quality Korean reasoning dataset establishes a strong foundation in Korean logical reasoning.<n>In the second stage, we employ reinforcement learning with a customized Group Relative Policy Optimization algorithm.
arXiv Detail & Related papers (2025-08-14T05:49:34Z)
The Emergence of Abstract Thought in Large Language Models Beyond Any Language [95.50197866832772]
Large language models (LLMs) function effectively across a diverse range of languages.<n>Preliminary studies observe that the hidden activations of LLMs often resemble English, even when responding to non-English prompts.<n>Recent results show strong multilingual performance, even surpassing English performance on specific tasks in other languages.
arXiv Detail & Related papers (2025-06-11T16:00:54Z)
Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models [56.61984030508691]
We present the first mechanistic interpretability study of language confusion.<n>We show that confusion points (CPs) are central to this phenomenon.<n>We show that editing a small set of critical neurons, identified via comparative analysis with a multilingual-tuned counterpart, substantially mitigates confusion.
arXiv Detail & Related papers (2025-05-22T11:29:17Z)
When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners [111.50503126693444]
We show that language-specific ablation consistently boosts multilingual reasoning performance.<n>Compared to post-training, our training-free ablation achieves comparable or superior results with minimal computational overhead.
arXiv Detail & Related papers (2025-05-21T08:35:05Z)
Language Mixing in Reasoning Language Models: Patterns, Impact, and Internal Causes [54.96891982093408]
Reasoning language models (RLMs) excel at complex tasks by leveraging a chain-of-thought process to generate structured intermediate steps.<n> Language mixing, i.e., reasoning steps containing tokens from languages other than the prompt, has been observed in their outputs and shown to affect performance.<n>We present the first systematic study of language mixing in RLMs, examining its patterns, impact, and internal causes across 15 languages.
arXiv Detail & Related papers (2025-05-20T18:26:53Z)
RedWhale: An Adapted Korean LLM Through Efficient Continual Pretraining [0.0]
We present RedWhale, a model specifically tailored for Korean language processing. RedWhale is developed using an efficient continual pretraining approach that includes a comprehensive Korean corpus preprocessing pipeline. Experimental results demonstrate that RedWhale outperforms other leading models on Korean NLP benchmarks.
arXiv Detail & Related papers (2024-08-21T02:49:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.