Related papers: Language Confusion Gate: Language-Aware Decoding Through Model Self-Distillation

Language Confusion Gate: Language-Aware Decoding Through Model Self-Distillation

URL: http://arxiv.org/abs/2510.17555v1
Date: Mon, 20 Oct 2025 14:02:37 GMT
Title: Language Confusion Gate: Language-Aware Decoding Through Model Self-Distillation
Authors: Collin Zhang, Fei Huang, Chenhan Yuan, Junyang Lin,
Abstract summary: This paper introduces the Language Confusion Gate (LCG), a lightweight, plug-in solution that filters tokens during decoding.<n>The LCG is trained using norm-adjusted self-distillation to predict appropriate language families and apply masking only when needed.
Score: 50.93756215410832
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) often experience language confusion, which is the unintended mixing of languages during text generation. Current solutions to this problem either necessitate model retraining or cannot differentiate between harmful confusion and acceptable code-switching. This paper introduces the Language Confusion Gate (LCG), a lightweight, plug-in solution that filters tokens during decoding without altering the base LLM. The LCG is trained using norm-adjusted self-distillation to predict appropriate language families and apply masking only when needed. Our method is based on the findings that language confusion is infrequent, correct-language tokens are usually among the top predictions, and output token embedding norms are larger for high-resource languages, which biases sampling. When evaluated across various models, including Qwen3, GPT-OSS, Gemma3, Llama3.1, LCG decreases language confusion significantly, often by an order of magnitude, without negatively impacting task performance. Code is available at https://github.com/collinzrj/language_confusion_gate.

Related papers

Evaluating Robustness of Large Language Models Against Multilingual Typographical Errors [45.37878669586302]
Large language models (LLMs) are increasingly deployed in multilingual, real-world applications with user inputs.<n>Most benchmarks assume clean input, leaving the robustness of LLMs to typos largely underexplored.<n>We introduce MulTypo, a multilingual typo generation algorithm that simulates human-like errors based on language-specific keyboard layouts and typing behavior.
arXiv Detail & Related papers (2025-10-10T16:49:12Z)
Smoothie-Qwen: Post-Hoc Smoothing to Reduce Language Bias in Multilingual LLMs [4.881694369042022]
Smoothie-Qwen is a lightweight, post-hoc method that mitigates language bias without retraining.<n>applied to the Qwen model, our method reduces unintended Chinese output by over 95%.
arXiv Detail & Related papers (2025-07-08T05:30:51Z)
Lost in the Mix: Evaluating LLM Understanding of Code-Switched Text [25.05270733872823]
Code-switching (CSW) is the act of alternating between two or more languages within a single discourse.<n>Large Language Models (LLMs) are now central to content and communication generation.
arXiv Detail & Related papers (2025-06-16T21:19:27Z)
Controlling Language Confusion in Multilingual LLMs [0.0]
Large language models often suffer from language confusion, a phenomenon in which responses are partially or entirely generated in unintended languages.<n>In this work, we apply ORPO, which adds penalties for unwanted output styles to standard SFT, effectively suppressing language-confused generations.
arXiv Detail & Related papers (2025-05-25T12:15:31Z)
Type-Constrained Code Generation with Language Models [51.03439021895432]
We introduce a type-constrained decoding approach that leverages type systems to guide code generation.<n>For this purpose, we develop novel prefix automata and a search over inhabitable types, forming a sound approach to enforce well-typedness on LLM-generated code.<n>Our approach reduces compilation errors by more than half and significantly increases functional correctness in code synthesis, translation, and repair tasks.
arXiv Detail & Related papers (2025-04-12T15:03:00Z)
Understanding and Mitigating Language Confusion in LLMs [76.96033035093204]
We evaluate 15 typologically diverse languages with existing and newly-created English and multilingual prompts.<n>We find that Llama Instruct and Mistral models exhibit high degrees of language confusion.<n>We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning.
arXiv Detail & Related papers (2024-06-28T17:03:51Z)
The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics [74.99898531299148]
This research examines vocabulary trimming (VT) inspired by restricting embedding entries to the language of interest to bolster time and memory efficiency. We apply two languages to trim the full vocabulary - Unicode-based script filtering and corpus-based selection - to different language families and sizes. It is found that VT reduces the memory usage of small models by nearly 50% and has an upper bound of 25% improvement in generation speed.
arXiv Detail & Related papers (2023-11-16T09:35:50Z)
Reducing language context confusion for end-to-end code-switching automatic speech recognition [50.89821865949395]
We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model. By calculating the respective attention of multiple languages, our method can efficiently transfer language knowledge from rich monolingual data.
arXiv Detail & Related papers (2022-01-28T14:39:29Z)
FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning. During inference, the model makes predictions based on the text input in the target language and its translation in the source language. To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.