Related papers: LoraxBench: A Multitask, Multilingual Benchmark Suite for 20 Indonesian Languages

LoraxBench: A Multitask, Multilingual Benchmark Suite for 20 Indonesian Languages

URL: http://arxiv.org/abs/2508.12459v1
Date: Sun, 17 Aug 2025 18:07:57 GMT
Title: LoraxBench: A Multitask, Multilingual Benchmark Suite for 20 Indonesian Languages
Authors: Alham Fikri Aji, Trevor Cohn,
Abstract summary: We introduce LoraxBench, a benchmark that focuses on low-resource languages of Indonesia.<n>Our dataset covers 20 languages, with the addition of two formality registers for three languages.<n>We show that a change in register affects model performance, especially with registers not commonly found in social media.
Score: 45.640417004733166
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: As one of the world's most populous countries, with 700 languages spoken, Indonesia is behind in terms of NLP progress. We introduce LoraxBench, a benchmark that focuses on low-resource languages of Indonesia and covers 6 diverse tasks: reading comprehension, open-domain QA, language inference, causal reasoning, translation, and cultural QA. Our dataset covers 20 languages, with the addition of two formality registers for three languages. We evaluate a diverse set of multilingual and region-focused LLMs and found that this benchmark is challenging. We note a visible discrepancy between performance in Indonesian and other languages, especially the low-resource ones. There is no clear lead when using a region-specific model as opposed to the general multilingual model. Lastly, we show that a change in register affects model performance, especially with registers not commonly found in social media, such as high-level politeness `Krama' Javanese.

Related papers

Multilingual Large Language Models do not comprehend all natural languages to equal degrees [3.1312895682585595]
Large Language Models (LLMs) play a critical role in how humans access information.<n>Most benchmarks evaluate LLMs in languages spoken by Western, Educated, Industrialised, Rich, and Democratic (WEIRD) communities.<n>We prompt 3 popular models on a language comprehension task across 12 languages.<n>Our results suggest that the models exhibit remarkable linguistic accuracy across typologically diverse languages.
arXiv Detail & Related papers (2026-02-23T17:22:46Z)
Do You Know About My Nation? Investigating Multilingual Language Models' Cultural Literacy Through Factual Knowledge [68.6805229085352]
Most multilingual question-answering benchmarks do not factor in regional diversity in the information they capture.<n>XNationQA encompasses a total of 49,280 questions on the geography, culture, and history of nine countries, presented in seven languages.<n>We benchmark eight standard multilingual LLMs on XNationQA and evaluate them using two novel transference metrics.
arXiv Detail & Related papers (2025-11-01T18:41:34Z)
FormosanBench: Benchmarking Low-Resource Austronesian Languages in the Era of Large Language Models [1.2403152094314245]
We introduce FORMOSANBENCH, the first benchmark for evaluating large language models (LLMs) on low-resource Austronesian languages.<n>We assess model performance in zero-shot, 10-shot, and fine-tuned settings using FORMOSANBENCH.<n>Our results reveal a substantial performance gap between high-resource and Formosan languages.
arXiv Detail & Related papers (2025-06-12T07:02:28Z)
Thank You, Stingray: Multilingual Large Language Models Can Not (Yet) Disambiguate Cross-Lingual Word Sense [30.62699081329474]
We introduce a novel benchmark for cross-lingual sense disambiguation, StingrayBench. We collect false friends in four language pairs, namely Indonesian-Malay, Indonesian-Tagalog, Chinese-Japanese, and English-German. In our analysis of various models, we observe they tend to be biased toward higher-resource languages.
arXiv Detail & Related papers (2024-10-28T22:09:43Z)
NusaBERT: Teaching IndoBERT to be Multilingual and Multicultural [0.0]
NusaBERT builds upon IndoBERT by incorporating vocabulary expansion and leveraging a diverse multilingual corpus that includes regional languages and dialects. Through rigorous evaluation across a range of benchmarks, NusaBERT demonstrates state-of-the-art performance in tasks involving multiple languages of Indonesia.
arXiv Detail & Related papers (2024-03-04T08:05:34Z)
NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages [54.808217147579036]
We conduct a case study on Indonesian local languages. We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets. Our findings demonstrate that datasets generated through paragraph writing by native speakers exhibit superior quality in terms of lexical diversity and cultural content.
arXiv Detail & Related papers (2023-09-19T14:42:33Z)
Baichuan 2: Open Large-scale Language Models [51.34140526283222]
We present Baichuan 2, a series of large-scale multilingual language models containing 7 billion and 13 billion parameters, trained from scratch, on 2.6 trillion tokens.<n>Baichuan 2 matches or outperforms other open-source models of similar size on public benchmarks like MMLU, CMMLU, GSM8K, and HumanEval.
arXiv Detail & Related papers (2023-09-19T04:13:22Z)
NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages [100.59889279607432]
We focus on developing resources for languages in Indonesia. Most languages in Indonesia are categorized as endangered and some are even extinct. We develop the first-ever parallel resource for 10 low-resource languages in Indonesia.
arXiv Detail & Related papers (2022-05-31T17:03:50Z)
Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios? [15.995677143912474]
We focus on North-African colloquial dialectal Arabic written using an extension of the Latin script, called NArabizi.<n>We show that a character-based model trained on only 99k sentences of NArabizi and fined-tuned on a small treebank leads to performance close to those obtained with the same architecture pre-trained on large multilingual and monolingual models.
arXiv Detail & Related papers (2021-10-26T14:59:16Z)
Multilingual and code-switching ASR challenges for low resource Indian languages [59.2906853285309]
We focus on building multilingual and code-switching ASR systems through two different subtasks related to a total of seven Indian languages. We provide a total of 600 hours of transcribed speech data, comprising train and test sets, in these languages. We also provide a baseline recipe for both the tasks with a WER of 30.73% and 32.45% on the test sets of multilingual and code-switching subtasks, respectively.
arXiv Detail & Related papers (2021-04-01T03:37:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.