Related papers: Humans overrely on overconfident language models, across languages

Humans overrely on overconfident language models, across languages

URL: http://arxiv.org/abs/2507.06306v2
Date: Fri, 08 Aug 2025 00:50:04 GMT
Title: Humans overrely on overconfident language models, across languages
Authors: Neil Rathi, Dan Jurafsky, Kaitlyn Zhou,
Abstract summary: We study the risks of multilingual linguistic (mis)calibration, overconfidence, and overreliance across five languages.<n>Our work finds that overreliance risks are high across languages.
Score: 32.71245803698373
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As large language models (LLMs) are deployed globally, it is crucial that their responses are calibrated across languages to accurately convey uncertainty and limitations. Prior work shows that LLMs are linguistically overconfident in English, leading users to overrely on confident generations. However, the usage and interpretation of epistemic markers (e.g., 'I think it's') differs sharply across languages. Here, we study the risks of multilingual linguistic (mis)calibration, overconfidence, and overreliance across five languages to evaluate LLM safety in a global context. Our work finds that overreliance risks are high across languages. We first analyze the distribution of LLM-generated epistemic markers and observe that LLMs are overconfident across languages, frequently generating strengtheners even as part of incorrect responses. Model generations are, however, sensitive to documented cross-linguistic variation in usage: for example, models generate the most markers of uncertainty in Japanese and the most markers of certainty in German and Mandarin. Next, we measure human reliance rates across languages, finding that reliance behaviors differ cross-linguistically: for example, participants are significantly more likely to discount expressions of uncertainty in Japanese than in English (i.e., ignore their 'hedging' function and rely on generations that contain them). Taken together, these results indicate a high risk of reliance on overconfident model generations across languages. Our findings highlight the challenges of multilingual linguistic calibration and stress the importance of culturally and linguistically contextualized model safety evaluations.

Related papers

When Meanings Meet: Investigating the Emergence and Quality of Shared Concept Spaces during Multilingual Language Model Training [57.230355403478995]
We investigate the development of language-agnostic concept spaces during pretraining of EuroLLM.<n>We find that shared concept spaces emerge early and continue to refine, but that alignment with them is language-dependent.<n>In contrast to prior work, our fine-grained manual analysis reveals that some apparent gains in translation quality reflect shifts in behavior.
arXiv Detail & Related papers (2026-01-30T11:23:01Z)
Benchmarking Concept-Spilling Across Languages in LLMs [7.577675422356702]
Large Language Models (LLMs) exhibit remarkable cross-lingual abilities, yet often exhibit a systematic bias toward representations from other languages.<n>This paper presents a novel comparative framework for evaluating multilingual semantic robustness by measuring how models handle polysemous words across languages.
arXiv Detail & Related papers (2026-01-18T19:28:26Z)
Parallel Universes, Parallel Languages: A Comprehensive Study on LLM-based Multilingual Counterfactual Example Generation [49.2073409243885]
Large language models (LLMs) excel at generating English counterfactuals and demonstrate multilingual proficiency.<n>We conduct automatic evaluations on both directly generated counterfactuals in the target languages and those derived via English translation across six languages.<n>We identify and categorize four main types of errors that consistently appear in the generated counterfactuals across languages.
arXiv Detail & Related papers (2026-01-01T08:53:49Z)
The Emergence of Abstract Thought in Large Language Models Beyond Any Language [95.50197866832772]
Large language models (LLMs) function effectively across a diverse range of languages.<n>Preliminary studies observe that the hidden activations of LLMs often resemble English, even when responding to non-English prompts.<n>Recent results show strong multilingual performance, even surpassing English performance on specific tasks in other languages.
arXiv Detail & Related papers (2025-06-11T16:00:54Z)
Language Matters: How Do Multilingual Input and Reasoning Paths Affect Large Reasoning Models? [59.970391602080205]
Despite multilingual training, LRMs tend to default to reasoning in high-resource languages at test time.<n>Cultural reasoning degrades performance on reasoning tasks but benefits cultural tasks, while safety evaluations exhibit language-specific behavior.
arXiv Detail & Related papers (2025-05-23T02:46:18Z)
Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models [56.61984030508691]
We present the first mechanistic interpretability study of language confusion.<n>We show that confusion points (CPs) are central to this phenomenon.<n>We show that editing a small set of critical neurons, identified via comparative analysis with a multilingual-tuned counterpart, substantially mitigates confusion.
arXiv Detail & Related papers (2025-05-22T11:29:17Z)
When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners [111.50503126693444]
We show that language-specific ablation consistently boosts multilingual reasoning performance.<n>Compared to post-training, our training-free ablation achieves comparable or superior results with minimal computational overhead.
arXiv Detail & Related papers (2025-05-21T08:35:05Z)
The Hidden Space of Safety: Understanding Preference-Tuned LLMs in Multilingual context [0.9130277390156759]
Alignment tuning has enabled large language models to excel in reasoning, instruction-following, and minimizing harmful generations.<n>Despite their widespread deployment, these models exhibit a monolingual bias, raising concerns about the effectiveness of alignment across languages.<n>Current alignment methods predominantly focus on English, leaving it unclear how alignment mechanism generalizes to multilingual settings.
arXiv Detail & Related papers (2025-04-03T15:46:46Z)
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Inconsistencies [63.10843814055688]
M-ALERT is a benchmark that evaluates the safety of Large Language Models in five languages.<n>M-ALERT includes 15k high-quality prompts per language, totaling 75k, with category-wise annotations.<n>Our experiments on 39 state-of-the-art LLMs highlight the importance of language-specific safety analysis.
arXiv Detail & Related papers (2024-12-19T16:46:54Z)
Thank You, Stingray: Multilingual Large Language Models Can Not (Yet) Disambiguate Cross-Lingual Word Sense [30.62699081329474]
We introduce a novel benchmark for cross-lingual sense disambiguation, StingrayBench. We collect false friends in four language pairs, namely Indonesian-Malay, Indonesian-Tagalog, Chinese-Japanese, and English-German. In our analysis of various models, we observe they tend to be biased toward higher-resource languages.
arXiv Detail & Related papers (2024-10-28T22:09:43Z)
XTRUST: On the Multilingual Trustworthiness of Large Language Models [14.128810448194699]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of natural language processing (NLP) tasks.<n>A key question that now preoccupies the AI community concerns the capabilities and limitations of these models.<n>X is the first comprehensive multilingual trustworthiness benchmark.
arXiv Detail & Related papers (2024-09-24T05:38:33Z)
Understanding and Mitigating Language Confusion in LLMs [76.96033035093204]
We evaluate 15 typologically diverse languages with existing and newly-created English and multilingual prompts.<n>We find that Llama Instruct and Mistral models exhibit high degrees of language confusion.<n>We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning.
arXiv Detail & Related papers (2024-06-28T17:03:51Z)
Exploring Multilingual Concepts of Human Value in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages? [34.38469832305664]
This paper focuses on human values-related concepts (i.e., value concepts) due to their significance for AI safety. We first empirically confirm the presence of value concepts within LLMs in a multilingual format. Further analysis on the cross-lingual characteristics of these concepts reveals 3 traits arising from language resource disparities.
arXiv Detail & Related papers (2024-02-28T07:18:39Z)
MlingConf: A Comprehensive Study of Multilingual Confidence Estimation on Large Language Models [36.33453112932689]
This paper introduces a comprehensive investigation of Multilingual Confidence estimation (MlingConf) on Large Language Models (LLMs)<n>The benchmark comprises four meticulously checked and human-evaluated high-quality multilingual datasets for LA tasks and one for the LS task tailored to specific social, cultural, and geographical contexts of a language.<n>Experiments reveal that on LA tasks English exhibits notable linguistic dominance in confidence estimations than other languages, while on LS tasks, using question-related language to prompt LLMs demonstrates better linguistic dominance in multilingual confidence estimations.
arXiv Detail & Related papers (2024-02-21T08:20:06Z)
All Languages Matter: On the Multilingual Safety of Large Language Models [96.47607891042523]
We build the first multilingual safety benchmark for large language models (LLMs) XSafety covers 14 kinds of commonly used safety issues across 10 languages that span several language families. We propose several simple and effective prompting methods to improve the multilingual safety of ChatGPT.
arXiv Detail & Related papers (2023-10-02T05:23:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.