A Diagnostic Benchmark for Sweden-Related Factual Knowledge
- URL: http://arxiv.org/abs/2510.21360v1
- Date: Fri, 24 Oct 2025 11:42:32 GMT
- Title: A Diagnostic Benchmark for Sweden-Related Factual Knowledge
- Authors: Jenny Kunz,
- Abstract summary: The dataset can be used to measure factual recall across models of varying sizes and degrees of Swedish coverage.<n>Using the dataset, we find that smaller models with stronger Swedish coverage perform comparably to a three times larger multilingual model in recalling Sweden-related facts.
- Score: 0.6599344783327054
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Many Swedish benchmarks are translated US-centric benchmarks, and therefore not suitable for testing knowledge that is particularly relevant, or even specific, to Sweden. We therefore introduce a manually written question-answering benchmark specifically targeted to Sweden-related personalities and events, many of which receive very limited coverage in international media. Our annotators drew inspiration from a popular radio program featuring public figures from culture and media, as well as major sports events in Sweden. The dataset can be used to measure factual recall across models of varying sizes and degrees of Swedish coverage, and allows to probe cross-lingual factual consistency as to contains English translations. Using the dataset, we find that smaller models with stronger Swedish coverage perform comparably to a three times larger multilingual model in recalling Sweden-related facts. We also observe that continued pre-training on Swedish generally improves factual knowledge but also leads to forgetting of a part of the previously known information. These results demonstrate the dataset's potential as a diagnostic tool for studying language adaptation and knowledge retention in multilingual models and during language adaptation.
Related papers
- Preferences for Idiomatic Language are Acquired Slowly -- and Forgotten Quickly: A Case Study on Swedish [0.6599344783327054]
We investigate how language models develop preferences for textitidiomatic as compared to textitlinguistically acceptable Swedish.<n>For linguistic acceptability, we adapt existing benchmarks into a minimal-pair format.<n>Our findings suggest that idiomatic competence emerges more slowly than other linguistic abilities.
arXiv Detail & Related papers (2026-02-03T12:57:39Z) - Swedish Whispers; Leveraging a Massive Speech Corpus for Swedish Speech Recognition [1.1417805445492082]
This work presents a suite of fine-tuned Whisper models for Swedish, trained on a dataset of unprecedented size and variability for this mid-resourced language.<n>We report an overall improvement across model sizes compared to OpenAI's Whisper evaluated on Swedish.
arXiv Detail & Related papers (2025-05-23T06:42:16Z) - Tracing Multilingual Factual Knowledge Acquisition in Pretraining [83.93508231653091]
Large Language Models (LLMs) are capable of recalling multilingual factual knowledge present in their pretraining data.<n>We trace how factual recall and crosslingual consistency evolve during pretraining, focusing on OLMo-7B.<n>We find that both accuracy and consistency improve over time for most languages.
arXiv Detail & Related papers (2025-05-20T18:39:56Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.<n>But can these models relate corresponding concepts across languages, i.e., be crosslingual?<n>This study evaluates state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - Evaluating Large Language Models with Human Feedback: Establishing a Swedish Benchmark [0.0]
Large language models (LLMs) have demonstrated significant capabilities across numerous applications.
This study introduces a comprehensive human benchmark to assess the efficacy of prominent LLMs in understanding and generating Swedish language texts.
arXiv Detail & Related papers (2024-05-22T21:22:51Z) - Low-Rank Adaptation for Multilingual Summarization: An Empirical Study [60.541168233698194]
We investigate the potential of.
Efficient Fine-Tuning, focusing on Low-Rank Adaptation (LoRA) in the domain of multilingual summarization.
We conduct an extensive study across different data availability scenarios, including high- and low-data settings, and cross-lingual transfer.
Our findings reveal that LoRA is competitive with full fine-tuning when trained with high quantities of data, and excels in low-data scenarios and cross-lingual transfer.
arXiv Detail & Related papers (2023-11-14T22:32:39Z) - Transfer to a Low-Resource Language via Close Relatives: The Case Study
on Faroese [54.00582760714034]
Cross-lingual NLP transfer can be improved by exploiting data and models of high-resource languages.
We release a new web corpus of Faroese and Faroese datasets for named entity recognition (NER), semantic text similarity (STS) and new language models trained on all Scandinavian languages.
arXiv Detail & Related papers (2023-04-18T08:42:38Z) - ScandEval: A Benchmark for Scandinavian Natural Language Processing [0.0]
This paper introduces a Scandinavian benchmarking platform, ScandEval, which can benchmark any pretrained model on four different tasks in the Scandinavian languages.
The datasets used in two of the tasks, linguistic acceptability and question answering, are new.
We develop and release a Python package and command-line interface, scandeval, which can benchmark any model that has been uploaded to the Hugging Face Hub, with reproducible results.
arXiv Detail & Related papers (2023-04-03T11:51:46Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - Intent Classification Using Pre-Trained Embeddings For Low Resource
Languages [67.40810139354028]
Building Spoken Language Understanding systems that do not rely on language specific Automatic Speech Recognition is an important yet less explored problem in language processing.
We present a comparative study aimed at employing a pre-trained acoustic model to perform Spoken Language Understanding in low resource scenarios.
We perform experiments across three different languages: English, Sinhala, and Tamil each with different data sizes to simulate high, medium, and low resource scenarios.
arXiv Detail & Related papers (2021-10-18T13:06:59Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - Playing with Words at the National Library of Sweden -- Making a Swedish
BERT [0.0]
This paper introduces the Swedish BERT ("KB-BERT") developed by the KBLab for data-driven research at the National Library of Sweden (KB)
Building on recent efforts to create transformer-based BERT models for languages other than English, we explain how we used KB's collections to create and train a new language-specific BERT model for Swedish.
arXiv Detail & Related papers (2020-07-03T12:53:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.