Testing Low-Resource Language Support in LLMs Using Language Proficiency Exams: the Case of Luxembourgish
- URL: http://arxiv.org/abs/2504.01667v2
- Date: Thu, 03 Apr 2025 11:39:22 GMT
- Title: Testing Low-Resource Language Support in LLMs Using Language Proficiency Exams: the Case of Luxembourgish
- Authors: Cedric Lothritz, Jordi Cabot,
- Abstract summary: This study investigates the viability of language proficiency exams as such evaluation tools for the Luxembourgish language.<n>We find that large models such as ChatGPT, Claude and DeepSeek-R1 typically achieve high scores, while smaller models show weak performances.<n>We also find that the performances in such language exams can be used to predict performances in other NLP tasks.
- Score: 2.0533856328275273
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have become an increasingly important tool in research and society at large. While LLMs are regularly used all over the world by experts and lay-people alike, they are predominantly developed with English-speaking users in mind, performing well in English and other wide-spread languages while less-resourced languages such as Luxembourgish are seen as a lower priority. This lack of attention is also reflected in the sparsity of available evaluation tools and datasets. In this study, we investigate the viability of language proficiency exams as such evaluation tools for the Luxembourgish language. We find that large models such as ChatGPT, Claude and DeepSeek-R1 typically achieve high scores, while smaller models show weak performances. We also find that the performances in such language exams can be used to predict performances in other NLP tasks.
Related papers
- Multilingual Performance Biases of Large Language Models in Education [39.14806026620442]
Large language models (LLMs) are increasingly being adopted in educational settings.
This work ascertains if their use in education settings in non-English languages is warranted.
arXiv Detail & Related papers (2025-04-24T16:32:31Z) - Dictionary Insertion Prompting for Multilingual Reasoning on Multilingual Large Language Models [52.00446751692225]
We present a novel and simple yet effective method called textbfDictionary textbfInsertion textbfPrompting (textbfDIP)
When providing a non-English prompt, DIP looks up a word dictionary and inserts words' English counterparts into the prompt for LLMs.
It then enables better translation into English and better English model thinking steps which leads to obviously better results.
arXiv Detail & Related papers (2024-11-02T05:10:50Z) - One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks [68.33068005789116]
We present the first study aimed at objectively assessing the fairness and robustness of Large Language Models (LLMs) in handling dialects in canonical reasoning tasks.<n>We hire AAVE speakers, including experts with computer science backgrounds, to rewrite seven popular benchmarks, such as HumanEval and GSM8K.<n>Our findings reveal that textbfalmost all of these widely used models show significant brittleness and unfairness to queries in AAVE.
arXiv Detail & Related papers (2024-10-14T18:44:23Z) - Generating bilingual example sentences with large language models as lexicography assistants [2.6550899846546527]
We present a study of LLMs' performance in generating and rating example sentences for bilingual dictionaries across languages with varying resource levels.
We evaluate the quality of LLM-generated examples against the GDEX (Good Dictionary EXample) criteria: typicality, informativeness, and intelligibility.
arXiv Detail & Related papers (2024-10-04T06:45:48Z) - Do Large Language Models Speak All Languages Equally? A Comparative Study in Low-Resource Settings [12.507989493130175]
Large language models (LLMs) have garnered significant interest in natural language processing (NLP)
Recent studies have highlighted the limitations of LLMs in low-resource languages.
We present datasets for sentiment and hate speech tasks by translating from English to Bangla, Hindi, and Urdu.
arXiv Detail & Related papers (2024-08-05T05:09:23Z) - Language Ranker: A Metric for Quantifying LLM Performance Across High and Low-Resource Languages [48.40607157158246]
Large Language Models (LLMs) perform better on high-resource languages like English, German, and French, while their capabilities in low-resource languages remain inadequate.<n>We propose the Language Ranker, an intrinsic metric designed to benchmark and rank languages based on LLM performance using internal representations.<n>Our analysis reveals that high-resource languages exhibit higher similarity scores with English, demonstrating superior performance, while low-resource languages show lower similarity scores.
arXiv Detail & Related papers (2024-04-17T16:53:16Z) - LLMs Are Few-Shot In-Context Low-Resource Language Learners [59.74451570590808]
In-context learning (ICL) empowers large language models (LLMs) to perform diverse tasks in underrepresented languages.
We extensively study ICL and its cross-lingual variation (X-ICL) on 25 low-resource and 7 relatively higher-resource languages.
Our study concludes the significance of few-shot in-context information on enhancing the low-resource understanding quality of LLMs.
arXiv Detail & Related papers (2024-03-25T07:55:29Z) - High-quality Data-to-Text Generation for Severely Under-Resourced
Languages with Out-of-the-box Large Language Models [5.632410663467911]
We explore the extent to which pretrained large language models (LLMs) can bridge the performance gap for under-resourced languages.
We find that LLMs easily set the state of the art for the under-resourced languages by substantial margins.
For all our languages, human evaluation shows on-a-par performance with humans for our best systems, but BLEU scores collapse compared to English.
arXiv Detail & Related papers (2024-02-19T16:29:40Z) - Zero-Shot Cross-Lingual Reranking with Large Language Models for
Low-Resource Languages [51.301942056881146]
We investigate how large language models (LLMs) function as rerankers in cross-lingual information retrieval systems for African languages.
Our implementation covers English and four African languages (Hausa, Somali, Swahili, and Yoruba)
We examine cross-lingual reranking with queries in English and passages in the African languages.
arXiv Detail & Related papers (2023-12-26T18:38:54Z) - Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts [75.33019401706188]
Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars.
We propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English.
Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages.
arXiv Detail & Related papers (2023-06-20T08:27:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.