Related papers: Calibrating Beyond English: Language Diversity for Better Quantized Multilingual LLM

Calibrating Beyond English: Language Diversity for Better Quantized Multilingual LLM

URL: http://arxiv.org/abs/2601.18306v1
Date: Mon, 26 Jan 2026 09:36:03 GMT
Title: Calibrating Beyond English: Language Diversity for Better Quantized Multilingual LLM
Authors: Everlyn Asiko Chimoto, Mostafa Elhoushi, Bruce A. Bassett,
Abstract summary: Non-English and multilingual calibration sets significantly improve perplexity compared to English-only baselines.<n> tailoring calibration sets to the evaluation language yields the largest improvements for individual languages.
Score: 10.689556615369272
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Quantization is an effective technique for reducing the storage footprint and computational costs of Large Language Models (LLMs), but it often results in performance degradation. Existing post-training quantization methods typically use small, English-only calibration sets; however, their impact on multilingual models remains underexplored. We systematically evaluate eight calibration settings (five single-language and three multilingual mixes) on two quantizers (GPTQ, AWQ) on data from 10 languages. Our findings reveal a consistent trend: non-English and multilingual calibration sets significantly improve perplexity compared to English-only baselines. Specifically, we observe notable average perplexity gains across both quantizers on Llama3.1 8B and Qwen2.5 7B, with multilingual mixes achieving the largest overall reductions of up to 3.52 points in perplexity. Furthermore, our analysis indicates that tailoring calibration sets to the evaluation language yields the largest improvements for individual languages, underscoring the importance of linguistic alignment. We also identify specific failure cases where certain language-quantizer combinations degrade performance, which we trace to differences in activation range distributions across languages. These results highlight that static one-size-fits-all calibration is suboptimal and that tailoring calibration data, both in language and diversity, plays a crucial role in robustly quantizing multilingual LLMs.

Related papers

Investigating the Multilingual Calibration Effects of Language Model Instruction-Tuning [58.355275813623685]
This work looks at a critical gap in the calibration of large language models (LLMs) within multilingual settings.<n>Even in low-resource languages, model confidence can increase significantly after instruction-tuning on high-resource language SFT datasets.<n>However, improvements in accuracy are marginal or non-existent, highlighting a critical shortcoming of standard SFT for multilingual languages.
arXiv Detail & Related papers (2026-01-04T04:29:12Z)
The Uneven Impact of Post-Training Quantization in Machine Translation [6.398727997282354]
Quantization is essential for deploying large language models (LLMs) on resource-constrained hardware, but its implications for multilingual tasks remain underexplored.<n>We conduct the first large-scale evaluation of post-training quantization (PTQ) on machine translation across 55 languages using five LLMs ranging from 1.7B to 70B parameters.<n>Our analysis reveals that while 4-bit quantization often preserves translation quality for high-resource languages, significant degradation occurs for low-resource and typologically diverse languages, particularly in 2-bit settings.
arXiv Detail & Related papers (2025-08-28T15:22:31Z)
Towards Inclusive NLP: Assessing Compressed Multilingual Transformers across Diverse Language Benchmarks [33.2185998586144]
This study benchmarks the performance of multilingual and monolingual Large Language Models (LLMs) across Arabic, English, and Indic languages.<n>Findings show significant performance differences driven by linguistic diversity and resource availability.<n> Quantization (4-bit and 8-bit) is effective in maintaining model accuracy while promoting efficiency, but aggressive pruning significantly compromises performance.
arXiv Detail & Related papers (2025-07-25T22:35:10Z)
Balanced Multi-Factor In-Context Learning for Multilingual Large Language Models [53.38288894305388]
Multilingual large language models (MLLMs) are able to leverage in-context learning (ICL) to achieve high performance by leveraging cross-lingual knowledge transfer without parameter updates.<n>Three key factors influence multilingual ICL: (1) semantic similarity, (2) linguistic alignment, and (3) language-specific performance.<n>We propose balanced multi-factor ICL (textbfBMF-ICL), a method that quantifies and optimally balances these factors for improved example selection.
arXiv Detail & Related papers (2025-02-17T06:56:33Z)
Investigating Language-Specific Calibration For Pruning Multilingual Large Language Models [11.421452042888523]
We compare different calibration languages for pruning multilingual models across diverse languages, tasks, models, and SotA pruning techniques. Our results offer practical suggestions, for example, calibrating in the target language can efficiently retain the language modeling capability but does not necessarily benefit downstream tasks.
arXiv Detail & Related papers (2024-08-26T16:29:13Z)
How Does Quantization Affect Multilingual LLMs? [50.867324914368524]
Quantization techniques are widely used to improve inference speed and deployment of large language models. We conduct a thorough analysis of quantized multilingual LLMs, focusing on performance across languages and at varying scales.
arXiv Detail & Related papers (2024-07-03T15:39:40Z)
On the Calibration of Multilingual Question Answering LLMs [57.296161186129545]
We benchmark the calibration of several multilingual Large Language Models (MLLMs) on a variety of Question Answering tasks. We study different dimensions of calibration in in-distribution, out-of-distribution, and cross-lingual transfer settings. For decoder-only LLMs such as LlaMa2, we additionally find that in-context learning improves confidence calibration on multilingual data.
arXiv Detail & Related papers (2023-11-15T03:29:02Z)
On the Calibration of Massively Multilingual Language Models [15.373725507698591]
Massively Multilingual Language Models (MMLMs) have recently gained popularity due to their surprising effectiveness in cross-lingual transfer. We first investigate the calibration of MMLMs in the zero-shot setting and observe a clear case of miscalibration in low-resource languages. We also find that few-shot examples in the language can further help reduce the calibration errors, often substantially.
arXiv Detail & Related papers (2022-10-21T21:41:56Z)
High-resource Language-specific Training for Multilingual Neural Machine Translation [109.31892935605192]
We propose the multilingual translation model with the high-resource language-specific training (HLT-MT) to alleviate the negative interference. Specifically, we first train the multilingual model only with the high-resource pairs and select the language-specific modules at the top of the decoder. HLT-MT is further trained on all available corpora to transfer knowledge from high-resource languages to low-resource languages.
arXiv Detail & Related papers (2022-07-11T14:33:13Z)
Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization. We show how to practically optimize this objective for large translation corpora using an iterated best response scheme. Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.