Related papers: Benchmarking Linguistic Diversity of Large Language Models

Benchmarking Linguistic Diversity of Large Language Models

URL: http://arxiv.org/abs/2412.10271v1
Date: Fri, 13 Dec 2024 16:46:03 GMT
Title: Benchmarking Linguistic Diversity of Large Language Models
Authors: Yanzhu Guo, Guokan Shang, Chloé Clavel,
Abstract summary: This paper emphasizes the importance of examining the preservation of human linguistic richness by language models.<n>We propose a comprehensive framework for evaluating LLMs from various linguistic diversity perspectives.
Score: 14.824871604671467
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The development and evaluation of Large Language Models (LLMs) has primarily focused on their task-solving capabilities, with recent models even surpassing human performance in some areas. However, this focus often neglects whether machine-generated language matches the human level of diversity, in terms of vocabulary choice, syntactic construction, and expression of meaning, raising questions about whether the fundamentals of language generation have been fully addressed. This paper emphasizes the importance of examining the preservation of human linguistic richness by language models, given the concerning surge in online content produced or aided by LLMs. We propose a comprehensive framework for evaluating LLMs from various linguistic diversity perspectives including lexical, syntactic, and semantic dimensions. Using this framework, we benchmark several state-of-the-art LLMs across all diversity dimensions, and conduct an in-depth case study for syntactic diversity. Finally, we analyze how different development and deployment choices impact the linguistic diversity of LLM outputs.

Related papers

The Emergence of Abstract Thought in Large Language Models Beyond Any Language [95.50197866832772]
Large language models (LLMs) function effectively across a diverse range of languages.<n>Preliminary studies observe that the hidden activations of LLMs often resemble English, even when responding to non-English prompts.<n>Recent results show strong multilingual performance, even surpassing English performance on specific tasks in other languages.
arXiv Detail & Related papers (2025-06-11T16:00:54Z)
When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners [111.50503126693444]
We show that language-specific ablation consistently boosts multilingual reasoning performance.<n>Compared to post-training, our training-free ablation achieves comparable or superior results with minimal computational overhead.
arXiv Detail & Related papers (2025-05-21T08:35:05Z)
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation [60.52580061637301]
MMLU-ProX is a comprehensive benchmark covering 13 typologically diverse languages with approximately 11,829 questions per language. We evaluate 25 state-of-the-art large language models (LLMs) using 5-shot chain-of-thought (CoT) and zero-shot prompting strategies, analyzing their performance across linguistic and cultural boundaries. Our experiments reveal consistent performance degradation from high-resource languages to lower-resource ones, with the best models achieving over 70% accuracy on English but dropping to around 40% for languages like Swahili.
arXiv Detail & Related papers (2025-03-13T15:59:20Z)
Uncovering inequalities in new knowledge learning by large language models across different languages [66.687369838071]
We show that low-resource languages consistently face disadvantages across all four dimensions. We aim to raise awareness of linguistic inequalities in LLMs' new knowledge learning, fostering the development of more inclusive and equitable future LLMs.
arXiv Detail & Related papers (2025-03-06T03:41:47Z)
The Multilingual Mind : A Survey of Multilingual Reasoning in Language Models [18.399229357408043]
Multilingual reasoning requires language models to handle logical reasoning across languages. This survey provides the first in-depth review of multilingual reasoning in Language Models.
arXiv Detail & Related papers (2025-02-13T16:25:16Z)
Exploring Robustness of LLMs to Sociodemographically-Conditioned Paraphrasing [7.312170216336085]
We take a broader approach to explore a wider range of variations across sociodemographic dimensions. We extend the SocialIQA dataset to create diverse paraphrased sets conditioned on sociodemographic styles. We find that demographic-specific paraphrasing significantly impacts the performance of language models.
arXiv Detail & Related papers (2025-01-14T17:50:06Z)
The dynamics of meaning through time: Assessment of Large Language Models [2.5864824580604515]
This study aims to evaluate the capabilities of various large language models (LLMs) in capturing temporal dynamics of meaning. Our comparative analysis includes prominent models like ChatGPT, GPT-4, Claude, Bard, Gemini, and Llama. Findings reveal marked differences in each model's handling of historical context and semantic shifts, highlighting both strengths and limitations in temporal semantic understanding.
arXiv Detail & Related papers (2025-01-09T19:56:44Z)
Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs) It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs. It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z)
LLM for Everyone: Representing the Underrepresented in Large Language Models [21.07409393578553]
This thesis aims to bridge the gap in NLP research and development by focusing on underrepresented languages. A comprehensive evaluation of large language models (LLMs) is conducted to assess their capabilities in these languages. The proposed solutions cover cross-lingual continual instruction tuning, retrieval-based cross-lingual in-context learning, and in-context query alignment.
arXiv Detail & Related papers (2024-09-20T20:53:22Z)
Language Model Alignment in Multilingual Trolley Problems [138.5684081822807]
Building on the Moral Machine experiment, we develop a cross-lingual corpus of moral dilemma vignettes in over 100 languages called MultiTP. Our analysis explores the alignment of 19 different LLMs with human judgments, capturing preferences across six moral dimensions. We discover significant variance in alignment across languages, challenging the assumption of uniform moral reasoning in AI systems.
arXiv Detail & Related papers (2024-07-02T14:02:53Z)
Evaluating Knowledge-based Cross-lingual Inconsistency in Large Language Models [16.942897938964638]
Large Language Models (LLMs) have shown exceptional performance in various Natural Language Processing (NLP) tasks. Despite their successes, these models often exhibit significant inconsistencies when processing the same concepts across different languages. This study focuses on three primary questions: the existence of cross-lingual inconsistencies in LLMs, the specific aspects in which these inconsistencies manifest, and the correlation between cross-lingual consistency and multilingual capabilities of LLMs.
arXiv Detail & Related papers (2024-07-01T15:11:37Z)
A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers [48.314619377988436]
The rapid development of Large Language Models (LLMs) demonstrates remarkable multilingual capabilities in natural language processing. Despite the breakthroughs of LLMs, the investigation into the multilingual scenario remains insufficient. This survey aims to help the research community address multilingual problems and provide a comprehensive understanding of the core concepts, key techniques, and latest developments in multilingual natural language processing based on LLMs.
arXiv Detail & Related papers (2024-05-17T17:47:39Z)
Exploring Multilingual Concepts of Human Value in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages? [34.38469832305664]
This paper focuses on human values-related concepts (i.e., value concepts) due to their significance for AI safety. We first empirically confirm the presence of value concepts within LLMs in a multilingual format. Further analysis on the cross-lingual characteristics of these concepts reveals 3 traits arising from language resource disparities.
arXiv Detail & Related papers (2024-02-28T07:18:39Z)
How Far Can We Extract Diverse Perspectives from Large Language Models? [16.16678226707335]
We show that large language models (LLMs) can generate diverse perspectives on subjective topics. We propose a criteria-based prompting technique to ground diverse opinions. Our methods, applied to various tasks, show that LLMs can indeed produce diverse opinions according to the degree of task subjectivity.
arXiv Detail & Related papers (2023-11-16T11:23:38Z)
Cross-lingual Lifelong Learning [53.06904052325966]
We present a principled Cross-lingual Continual Learning (CCL) evaluation paradigm. We provide insights into what makes multilingual sequential learning particularly challenging. The implications of this analysis include a recipe for how to measure and balance different cross-lingual continual learning desiderata.
arXiv Detail & Related papers (2022-05-23T09:25:43Z)
AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context. It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts. Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.